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NETWORK  PARADOXES  AND  THE  INEFFICIENCY 
OF  NONCOOPERATIVE  GAMES 

Joel  E.  Cohen 
Rockefeller  University 
1230  York  Avenue,  Box  20 
New  York,  NY  10021 


Summary 

One  might  think  that  adding  am  additional  road 
to  a  traffic  network  would  improve,  or  at  least 
not  worsen,  the  time  travelers  taJce  to  go  from 
a  given  origin  to  a  given  destination  in  the  net¬ 
work.  In  1968,  D.  Braess  showed  that  adding 
a  road  to  a  congested  traffic  network  can  some¬ 
times  worsen  the  travel  time  from  origin  to  des¬ 
tination  for  all  travelers.  Analogous  surprises 
can  occur  in  networks  of  queues;  adding  servers 
may  slow  the  average  time  through  a  network 
for  all  travelers.  (Strangely,  in  queuing  net¬ 
works,  giving  travelers  more  information  about 
queue  lengths  may  make  them  worse  off  than 
giving  them  less  information.)  These  results 
are  special  cases  of  a  general  theorem,  due  to 
P.  Dubey  in  1986:  in  n-person  noncooperative 
games  with  smooth  payoff  functions,  Nash  equi¬ 
libria  are  generically  Pareto-inefficient.  This  tu¬ 
torial  talk  will  assume  no  prior  background  in 
the  theory  of  traffic  networks,  queues  or  games. 
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Block-decodable  Runlength-Umited  Codes  Via  Look-ahead  Technique 

K.A.  SchouhaBKr  Immiiik 


Since  the  early  1970s,  coding  methods  based  on 
((/,/r)-constrained  sequences  have  been  widely  used  in  such  high- 
capacity  storage  systems  as  magnetic  and  optical  disks  or  tapes. 
Properties  and  applications  of  ((/,Ar)-cunstrained  sequences,  or 
runlength-limited  (RLL)  sequences  as  they  are  often  called,  are 
surveyed  in  [1],  The  number  of  sequential  like  symbols  in  a  (bi¬ 
nary)  sequence  is  known  as  runlength.  A  {d,k)  RLL  sequence  is 
a  sequence  of  binary  symbols  characterized  by  two  parameters, 
(<f  +  I)  and  {k  +  1),  which  stipulate  the  minimum  and  maximum 
runlength,  respectively,  that  may  occur  in  the  sequence.  Closely 
related  to  RLL  sequences  are  {d,k)  sequences.  A  binary  sequence 
is  said  to  be  {d,k)  constrained  if  the  number  of  'zeros'  between 
any  pair  of  consecutive  'ones'  is  at  least  d  and  at  most 
k,  k  >  d.  Ai  ((/,  A;)-constrained  sequence  is  converted  into  a  {d,k) 
RLL  sequence  by  a  simple  coding  step  which  is  known  as 
precoding.  The  'ones'  in  the  (^,/c)-constrained  sequence  indicate 
the  positions  of  a  transition  1  -*  0  or  0  -»  1  of  the  corresponding 
runlength-limited  sequence. 

Codes  are  used  to  translate  source  data  into  the  constrained 
sequence.  Commonly,  the  source  data  is  partitioned  into  words 
of  length  m,  and  under  the  coding  rules,  these  m-tuples.are 
translated  into  n-tuples,  called  codewords.  Popular  (d,  k)  codes 
incorporated  in  disk  file  systems  are  the  (2,7)  and  the  (1,7)  codes 
of  rate  mjn  =  1/2  and  rate  2/3,  respectively  [1].  The  codes  are 
designed  using  the  bounded  delay  method  [2]  or  the  ACH  slid¬ 
ing-block  code  algorithm  [3].  The  principal  feature  of  a  (d,k)  (or 
other  finite-type  constraints)  code  constructed  with  the  sliding- 
block  code  algorithm  is  that  coded  sequences  can  be  decoded 
by  examining  a  limited  number  of  consecutive  symbols  without 
relying  on  external  state  information.  As  an  immediate  conse¬ 
quence,  these  codes  have  a  limited  amount  of  error  propagation. 
For  example,  a  single  bit  error  in  a  received  sequence  encoded 
by  the  (2,7)  sliding-block  code  propagates  at  most  over  four 
decoded  bits.  Blaum  [4]  showed  that  the  error  propagation  of 
sliding-block  codes  presents  a  problem  as  it  entails  an  extra  toad 
to  the  error  correction  circuitry  usually  used  in  conjunction  with 
the  (</,  k)  code. 

An  alternative  to  the  above  sliding-block  coding  scheme 
was  proposed  by  Tang  and  Bahl  [5].  There,  the  authors  use 
codes  compiled  from  codewords  of  fixed  length  which  can  be 
decoded  without  the  knowledge  of  preceding  or  succeeding 
codewords.  Codes  with  this  property,  that  is.  codes  that  can  be 
decoded  by  observing  single  codewords  (the  encoding  operation 
is  allowed  to  be  state  dependent),  will  be  called  block (- 
decodable)  code.,.  Evidently,  block-decodable  codes  offer  an  ad¬ 
vantageous  solution  relative  to  sliding-block  codes  since  they 


make  it  easier  to  preserve  a  particular  mapping  between  the 
source  and  the  code  symbols,  and,  obviously,  error  propagation 
is  localized  to  one  decoded  m-block.  Block-decodable  codes  are 
highly  suitable  in  conjunction  with  Reed-Solomon  error  control 
codes.  In  the  preferred  embodiment  of  the  coding  system,  the 
codewords  have  a  1-1  correspondence  with  the  elements  of  the 
finite  field  GF{2"'),  thus  enabling  the  construction  of,  for  in¬ 
stance,  a  Reed-Solomon  code  directly  over  the  (r/,k)-constrained 
codewords.  A  notable  drawback  of  state-of-the-art  block-de¬ 
codable  code  constructions,  however,  is  the  fact  that  at  code 
rates  R  =  min  approaching  the  Shannon  capacity  of  the 
(d,k)-constrained  channel,  the  implementations  can  be  fairly 
complex,  involving  long  codewords.  For  example,  the  minimum 
codeword  lengths  allowing  a  rate  R  =  2/3,  ( 1 ,7)  block-decodable 
code  and  a  rate  R  =  1/2,  (2,7)  block-decodable  code  are  33  and 
34,  respectively.  Our  design  approach  is  based  on  the  observa¬ 
tion  that  good  codes  must  be  constructed  on  RLL  sequences 
rather  than  {d,k)  sequences.  In  the  literature,  the  terms  (d,k)  se¬ 
quence  and  RLL  sequence  are  usually  used  as  synonyms,  and 
the  design  of  encoders  that  generate  RLL  sequences  is  almost 
always  conducted  by  designing  encoders  that  generate  {d,k)  se¬ 
quences  followed  by  a  precodcr.  It  is  generally  believed  that  this 
strategy  does  not  entail  a  loss  of  performance  in  terms  of  coder 
complexity  and  error  propagation.  It  will  be  shown,  however, 
that  it  is  surprisingly  profitable  in  terms  of  error  propagation  to 
design  RLL  encoders  directly,  i.e.  without  the  intermediate  step 
of  a  (</,k)-constrained  sequence.  The  new  RLL  codes  to  be  dis¬ 
cussed  are  block  decodable,  while  at  the  same  time  they  are 
simpler  to  implement  than  {d,k)  block-decodable  codes  currently 
being  used  [6], 
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Abstract 

A  variety  of  techniques  have  been  proposed  for  the 
construction  of  codes  for  input  restricted  channels,  in¬ 
cluding  variable  length  codewords,  codes  based  on  par¬ 
titioning  the  state  successor  trees,  and  methods  based 
on  state  splitting.  In  this  paper,  new  methods  that 
avoid  exhaustive  search  are  proposed  for  partitioning 
state  successor  trees  into  subtrees  termed  independent 
paths  that  are  used  for  the  coding.  First,  the  approxi¬ 
mating  eigenvector  algorithm  is  used  to  determine  the 
weights  of  the  states.  These  weights  are  then  parti¬ 
tioned  into  integer  parts,  which  are  used  to  form  the 
independent  paths  (IP’s).  Consistency  of  the  weights 
alone  is  checked  in  the  first  phase  of  the  algorithm,  and 
in  the  second  phase,  the  partitions  are  used  to  form  the 
IP’s  and  to  allot  information  symbols.  These  methods 
can  also  be  used  to  determine  the  sequence  of  splits 
needed  foi  the  state  splitting  technique. 

The  problem  of  encoding  information  to  fit  constraints  has  many 
applications  in  magnetic  recording  and  optical  communication. 
Given  a  finite-state  desciption  of  the  channel  constraints,  and  a 
desired  coding  rate,  one  way  to  obtain  a  code  [1,  2]  involves  the 
partition  of  a  state  successor  tree  (the  tree  of  paths  starting  at 
a  particular  state)  into  subtrees  with  specified  properties.  For 
one  class  of  codes,  the  subtrees  correspond[3]  to  ones  associated 
with  a  set  of  split  states  such  as  obtained  via  the  method  of 
Adler,  Coppersmith  and  Hass.aer[4l  It  is  usually  desirable  to 
find  a  code  corresponding  to  a  tree  partition  at  minimal  depth,  or 
equivalently  (for  stationary  codes)  requiring  a  minimum  number 
of  splits.  Moreover,  practical  consideration.<)  generally  dictate 
that  the  partition  depth  D  be  bounded  by  a  small  integei. 

One  way  to  proceed  is  via  an  exhaustive  search  over  all  possible 
state  splits.  We  here  describe  an  algorithm  which  organises  the 
search  somewhat  differently,  first  considering  only  integer  parti¬ 
tions  of  state  weights  in  the  tree  (  no  matter  how  obtained  ),  then, 
once  a  complete  partition  has  been  obtained,  reconstructing  po¬ 
tential  candidate  tree  partitions.  We  begin  with  the  definition  of 
an  independent  path,  a  central  concept  of  the  method. 
Definition:  An  independent  path  (IP)  of  length  N  is  a  set  of 
paths,  starting  at  some  state  cr  ,  with  the  property  that  they  can 
represent  one  sequence  of  N  input  blocks,  followed  by  anything 
else. 

Consider  the  state  successor  tree  associated  with  state  o.  The 
encoder  mapping  defines  equivalence  classes  of  paths:  those  that 
correspond  to  the  same  sequence  of  input  bits  of  length  N  are  in 
the  same  class.  To  distinguish  between  equivalence  classes  given 
only  the  set  of  output  states  then  requites  that  each  class  corre¬ 
spond  to  distirct  states  at  some  level  of  the  state  successor  tree. 
The  equivalence  classes  correspond  to  IPs,  and  the  decodability 
requirement  is  that  IPs  be  distinguishable  at  some  depth  D.  The 
partition  of  the  state  successor  tree  into  IP’s  is  based  on  neces¬ 
sary  conditions  for  the  existence  of  a  code  rather  than  sufficient 
conditions  as  in  [4]. 

Given  a  partition  (  i.e.  a  set  of  IPs),  one  may  then  construct  a 
code[l,  2].  The  number  of  IPs  that  start  at  each  state  (the  weight 
of  the  state)  corresponds  to  a  component  of  an  approximating 
eigenvector.  At  each  level  of  the  tree,  the  weight  of  the  node 
is  split  among  the  different  IP’s  that  go  through  the  node.  At 
leaves  of  the  tree,  each  state  lies  ecriiely  withm  a  single  IP,  and 


the  entire  weight  is  associated  with  that  IP  (a  natural  partition). 
At  higher  levels,  the  partition  of  the  weight  of  a  state  Oi  must  be 
a  combination  of  the  subweights  in  the  partitions  of  the  successor 
states  to  ffi,  since  each  IP  going  through  state  at  corresponds  to 
a  set  of  subweights  of  the  successor  states  of  ai.  Thus  a  search 
for  a  partition  of  the  tree  can  be  reduced  to  a  search  for  a  set  of 
compatible  partitions  of  the  weights  of  the  states  of  the  tree.  A 
tree  partition  can  be  obtained  as  follows: 

1.  Choose  a  rate  for  the  code  p/q.  Calcalate  the  weights 
oi  the  states  using  the  (A*,  2^)  approximate  eigenvector 
algorithm[l,  2,  4,  5]. 

2.  Algorithm  for  partitioning  successor  trees  into  IP’s 
The  UP  phase: 

•  For  a  specific  depth  of  tree  R  >  I,  starting  with  a 
natural  partition  at  depth  t-fl,  find  potential  partition 
of  weights  at  depth  t  that  will  be  compatible  with  the 
partitions  calculated  at  depth  t  -f- 1. 

Stop  when  a  complete  partition  (1,1,. . .  ,1)  is  obtained 
at  the  root  of  the  tree. 

The  DOWN  phase:  a  depth  first  search  for  a  compatible 
partition. 

s  Starting  with  a  complete  partition  (1,1 . 1)  at  the 

root,  for  t  =  0, 1 . R— 1,  choose  a  partition  at  depth 

t  -h  1  that  is  compatible  with  the  chosen  partition  at 
depth  i. 

Stop  when  a  ratuial  partition  is  obtained  at  the  leaves 
of  the  tree. 

3.  The  partition  of  the  states  yields  IP’s,  to  which  input  se¬ 
quences  may  be  alloted  using  the  methods  given  in  [2]. 

By  varying  the  conditions  on  the  successor  tree  partitions,  for 
example  by  permitting  a  dependence  on  the  path  used  to  reach  a 
state,  there  is  a  potential  for  more  general  code  structures.  The 
up  phase,  using  only  state  weights,  provides  a  means  for  bound¬ 
ing  the  minimum  required  value  of  the  depth  D.  For  stationary 
codes,  the  algorithms  provides  an  alternative  to  searching  over 
all  possible  state  splits,  and  provide  a  lower  bound  on  the  number 
of  rounds  of  state  splitting  required. 
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Construction  of  Polynomial- Size  Encoders  with  Small 
Decoding  Look-ahead  for  Input- Constrained  Channels 

Jonathan  J.  Ashley*  Brian  H.  Marcus*  Ron  M.  Roth^ 


Input-constrained  channels,  also  known  as  con¬ 
strained  systems,  are  widely-used  models  for  describ¬ 
ing  the  read-write  requirements  of  secondary  storage 
systems,  such  as  magnetic  disks  or  optical  memory 
devices.  A  constrained  system  S  is  defined  as  the 
set  of  constrained  sequences  obtained  by  reading  the 
labels  of  paths  of  a  finite  labeled  directed  graph  G. 

One  goal  in  the  study  of  constrained  systems  is 
designing  encoders  that  map  unconstrained  binary 
sequences,  referred  to  as  source  sequences,  into  con¬ 
strained  sequences  of  a  given  constrained  system  S. 
A  rate  p  :  q  finite-state  encoder  encodes  a  p-block  of 
source  symbols  to  a  9-block  in  5  in  a  state-dependent 
manner. 

An  encoder  is  lossless  of  finite  order  if  there  is  an 
integer  N  such  that  the  encoder  state  at  each  time 
slot  r,  together  with  the  9-block  generated  at  times 

r,r-t-l . r-f-AT-l,  determine  uniquely  the  source 

p-block  that  was  input  at  time  slot  r.  The  smallest 
number  for  which  this  is  possible  is  called  the  order 
of  the  encoder.  An  encoder  is  sliding-block  decodable 
if  the  source  sequence  which  was  input  to  the  encoder 
can  be  reconstructed  by  applying  a  decoding  function 
on  a  ‘window’  of  symbols  in  the  constrained  sequence. 

Several  schemes  have  been  suggested  for  construct¬ 
ing  finite-state  encoders,  most  notable  of  which  is 
the  Adler-Coppersmith-Hassner  (state  splitting)  al¬ 
gorithm  [1].  The  latter  provides  encoders  which  are 
lossless  of  finite-order.  Furthermore,  for  the  impor¬ 
tant  subclass  of  constrained  systems  of  finite  mem¬ 
ory  (such  as  (d.  /t)-run-length-limited  systems),  the 
resulting  encoders  are  sliding-block  decodable.  How¬ 
ever,  there  are  no  known  polynomial  upper  bounds, 
in  terms  of  the  number  of  states  in  a  deterministic 
graph  presentation  G,  on  the  window  length  or  im¬ 
plementation  size. 

‘IBM  Research  Division,  Alniaden  Research  Center, 
650  Harry  Road,  San  Jose,  CA  95120. 

*  Computer  Science  Department,  Technion  —  Israel  In¬ 
stitute  of  Technology,  Haifa  32000,  Israel. 


In  [3],  Ashley  constructed  encoders  with  order 
which  is  linear  in  the  number  k  of  states  in  a  deter¬ 
ministic  graph  G  that  presents  the  constrained  sys¬ 
tem  S.  The  resulting  encoders  have  rate  p(.  :  gt  for 
some  t  =  0{k).  When  these  encoders  are  translated 
into  rate  p  :  9  encoders,  the  latter  have  order  0(k), 
but  typically  they  are  not  sliding-block  decodable. 

In  this  work,  we  present  a  class  of  encoders,  called 
stethering  encoders,  based  on  a  construction  of  Adler, 
Goodwyn,  and  Weiss  in  [2].  Using  complexity  results, 
we  show  that  the  number  of  gates  and  memory-cells 
required  for  a  hardware  implementation  of  these  en¬ 
coders  is  at  most  polynomial  in  k.  We  show  that  this 
also  holds  for  the  construction  in  [3]. 

Then  we  show  that  for  any  constrained  system  S 
and  any  positive  integers  p  and  9  such  that  pjq  < 
c(S),  stethering  encoders  have  order  which  is  at  most 
linear  in  k  and  is  slightly  smaller  than  the  one  guar¬ 
anteed  in  [3].  We  show  that  the  stethering  encoders 
and  those  in  [3]  have  polynomial-size  decoders. 

For  constrained  systems  5  of  finite  memory  and 
rate  p/9  <  c(S)-((log2  e)l(Vq)),  stethering  encoders 
are  sliding-block  decodable  with  window  size  at  most 
quadratic  in  k. 
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ENUMERABLE  MULTI-TRACK  (d,k)  BLOCK  CODES 
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The  University  of  Arizona 
Tucson,  AZ  85721 


Recently,  a  new  class  of  nm-length-limited  codes,  referred  to 
as  two-dimensional  or  multi-track  modulation  codes,  has  been 
developed  [1].  These  codes  are  useful  in  applications  such  as 
digital  magnetic  recording  in  which  constraints  are  placed  on 
both  the  minimum  and  maximum  number  of  O’s  which  occur 
between  I’s.  These  constraints  are  parameterized  by  d  and  k, 
respectively. 

In  NRZI  encoding,  a  ‘1’  is  represented  by  a  transition  change 
in  polarity  states  while  a  ‘0’  is  symbolized  by  no  transition. 
In  order  to  prevent  adjacent  transitions  from  interfering  with 
each  other  (intersymbol  interference  (ISI)),  it  is  necessary  to 
ensure  that  some  number  of  ‘O’s  occur  between  ‘I’s.  This  ne¬ 
cessitates  the  d  constraint.  Also,  because  timing  information 
is  extracted  from  the  data  itself,  transitions  must  not  be  too 
far  apart.  Hence,  the  k  constraint,  which  limits  the  number  of 
consecutive  ‘O’s  which  occur  between  ‘I’s,  is  enforced. 

In  the  past,  (d,  k)  codes  have  been  devised  such  that  each 
track  individually  satisfies  both  constraints.  This  results  in  the 
average  capacity  for  all  tracks  to  be  equal  to  the  capacity  of 
single-track  (d,  k)  construnts;  where  capacity  is  the  theoretical 
maximum  code  rate  (ratio  of  source  bits  to  code  bits).  How¬ 
ever,  by  having  each  track  satisfy  the  d  constraint  (ISI  must 
be  controlled  in  each  track)  but  using  multiple  tracks  to  satisfy 
the  k  constraint,  increased  capacity  can  be  realized,  relative  to 
the  conventional  single-track  (d,k)  code  [1],  This  increase  in 
capacity  is  realized  because,  in  effect,  the  k  constraint  has  been 
relaxed.  For  multi-track  codes,  it  is  assumed  that  all  n  tracks 
are  read  in  parallel  and  used  (jointly)  to  derive  clocking  as  op¬ 
posed  to  single-track  codes  in  which  each  track  individually  is 
required  to  meet  the  k  constraint. 

For  example,  consider  the  following  sequences  which  satisfy 
a  two-track  (1,2)  constraint. 

track  1  OOOOlOlOOOlOlOOlO 
track  2  01000001000000100 

It  is  observed  that  although  both  tracks  individually  have  runs 
of  O’s  longer  than  k  =  2,  both  tracks  are  never  0  simultaneously 
more  than  twice  consecutively. 

In  this  paper,  we  propose  a  method  to  construct  multi-track 
(d,  k)  block  codes  which  can  be  implemented  using  an  enumera¬ 
tion  scheme  based  on  a  trellis.  While  block  codes  can  be  imple¬ 
mented  via  look-up  tables,  the  amount  of  memory  required  for 
such  an  implementation  increases  exponentially  with  the  block 
length.  Our  method  is  a  computational  algorithm  which  requires 
only  a  linear  increase  of  memory  (and  computations)  with  block 
length  (2). 

Enumeration  is  a  process  in  which  the  elements  of  a  given  set 
are  assigned  an  index  according  to  their  lexicographical  order. 
For  m-tuples  of  numbers,  a  lexicographic  ordering  can  be  de¬ 
fined  as  follows.  Forx  =  (xo, . . . , Xm-i)  and  y  =  (yo,  --,ym-i), 
then  X  <  y  if  there  exists  some  index  t  such  that  Xi  <  yi  and 
Xj  =  yj  W  j  <  I.  We  incorporate  a  modified  trellis  descrip¬ 
tion  of  the  multi-track  (d,  k)  channel  constraints  to  devise  easily 
enumerable  block  codes  with  good  code  rates. 

This  work  was  supported  by  International  Business  Machines  Corp¬ 
oration  and  by  the  National  Science  Foundation  under  Grant  No.  NCR- 
9258374. 


Consider  an  output  symbol  c(l)  at  time  I  derived  from  the 
outputs  of  each  of  the  rt  tracks  Zj(f)  as 

c(/)  =  “f:z.(/)y.  (1) 

tsO 

Hence,  a  new  symbol  set  A  =  {0, 1, ...  ,2"  —  1}  is  formed  and  a 
codeword  of  length  m  is  described  by  c  =  (co, ci, . . . , Cn-t)  €  C 
where  C  C  .4”’  is  the  set  of  codewords  (chosen  to  satisfy  the 
multi-track  (d,k)  constraints).  The  goal  of  our  enumeration 
scheme  is  to  lexicographically  order  the  set  of  codewords. 

Consider  the  one-step  state  transition  matrix  Bn(d,k)  in 
which  a  value  of  1  for  the  (U)‘*  element  indicates  that  state 
Sj  can  immediately  follow  state  Si.  This  information  can  also 
be  summarized  in  the  form  of  a  trellis,  indicating  allowable  state 
transitions.  In  our  method,  we  augment  the  trellis  description 
by  associating  2“  values  fV*f*l(/)  (p  =  1,2, ...,2")  with  each 
node.  The  argument  /  represents  the  time  index  of  the  trellis 
and  the  superscript  t(f)  denotes  the  index  of  a  state  at  time  I  in 
the  trellis.  For  a  given  terminal  set,  these  fV^l'l(/)  describe  the 
number  of  sequences  of  length  m  —  I  which  begin  in  that  state 
with  a  symbol  less  than  p  and  end  in  any  terminal  state. 

The  determination  of  the  lexicographic  number  of  a  given 
codeword  is  based  upon  knowing  the  number  of  allowable  se¬ 
quences  which  begin  with  each  of  the  elements  of  A  at  each 
node  of  the  trellis;  i.e.,  the  A/'^(')(/).  We  associate  a  unique  inte¬ 
ger  value  with  each  allowable  codeword  with  the  resulting  set  of 
integer  values  forming  a  contiguous  set.  This  set  takes  the  form 
{0, 1, . . . ,  I  C  I  — 1 }  where  |  C  |  is  the  number  of  codewords  in 
the  code. 

Encoding  of  a  R-bit  binary  source  word  begins  by  first  con¬ 
verting  it  to  its  decimal  equivalent  v.  Then  a  path  corresponding 
to  both  V  and  the  N^l'l(/)  is  traversed  through  the  modified  trel¬ 
lis.  At  the  path’s  completion,  the  codeword  which  results  is  that 
whose  value  in  the  lexicographic  ordered  set  of  all  codewords  is 
equal  to  v. 

Decoding  of  a  given  codeword  c  entails  following  a  path  (dic¬ 
tated  by  c)  through  the  modified  trellis  and  keeping  a  running 
sum  of  the  encountered  at  each  node  of  the  path.  At  the 

path’s  completion,  the  value  of  the  running  sum  is  equal  to  the 
lexicographic  number  of  c. 

An  interesting  facet  of  this  scheme  is  that  the  modified  trellis 
contains  all  the  relevant  information  concerning  both  decoding 
and  encoding  of  codewords.  Hence,  the  actual  codewords  under 
this  scheme  need  not  be  known.  They  are  generated  automati¬ 
cally  by  the  algorithm. 
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Abstract:  This  paper  presents  an  algorithm  for  run-length- 
limiting  and  charge-constraining  binary  data.  These  constraints  are 
specified  by  the  three  parameters  (d,k,c).  The  first  two 
constraints,  d  and  k,  put  a  lower  and  an  upper  bound  on  the  run- 
lengths.  The  third  parameter,  c,  puts  an  upper  bound  on  the 
absolute  accumulated  charge.  An  algorithm  is  optimal  if  its 
maximum  average  rate  equals  the  capacity  of  the  constraint  The 
algorithm  that  this  paper  presents,  known  as  the  bit  stuff  algorithm, 
is  a  variable  rate  algorithm  that  is  both  simple  and  univer^.  It  is 
optimal  for  the  (d,oo,<>o),  the  (d,d+l,<^),  and  the  (2c  — 2,*>o,c) 
constraints.  It  is  nearly  optimal  tor  all  other  constraints. 

Introduction 

In  [1],  Lee  presented  an  algorithm  for  sequentially 
satisfying  tte  k  and  c  constraints  of  a  (0,k,c)  constrained 
sequence  by  inserting  bits  into  an  arbitrary  data  s^uence.  We 
present  the  bit  stuff  algorithm  for  simultaneously  satisfying  the  k 
and  c  constraints  of  a  (d,k,c)  constrmned  sequence  by  inserting 
bits  into  an  arbitrary  data  sequence. 


Jack  K.  Wolf 

Center  for  Magnetic  Recording  Research 
University  of  California,  San  Diego 
li  Jolla,  CA  92093 

From  the  state  transition  probabiliy  matrix,  we  can  find  p,, 
the  probability  of  being  in  a  charge  state  i,  where 
—c+d  +  l^Hc.  In  terms  of  these  known  values,  the  average 
input  length  is 


/  /'  -e+*+| 

1-  ZV+p'-li-  IpJ 

L^(p,d.k,c)  =  — - '.ufliil- - V — 

1-p 

and  the  average  output  length  is 

/■  -Cfltl  \ 

L^(p.d,k,c)  =  I^{p,d.k,c)  +  d+p,  +  p^-*\  1  -  I 

V  J 

Therefore,  the  average  information  rate  is 

We  maximize  the  average  information  rate  with  respect  to  p  in 
order  to  find  the  maximum  average  information  rate  of  the  bit  stuff 
code  for  a  particular  (d,k,c)  constraint 


The  encoder  for  the  {d,k,c)  bit  stuff  algorithm  uses  two 
variables  to  keep  track  of  the  information  need  to  correctly  insert 
the  extra  bits.  TTie  first  variable,  k',  keeps  track  of  the  current  run- 
length,  where  a  run  is  a  string  of  consecutive  O's.  If  k'  is  ever 
equal  to  k,  then  the  encoder  inserts  a  1  to  avoid  a  possible 
violation  of  the  k  constraint  The  second  variable,  c\  keeps  track 
of  the  accumulated  charge  in  the  opposite  direction  of  the  current 
tun's  charge.  If  c'  is  ever  equal  to  -c  1,  then  the  encoder  inserts 
a  1  to  avoid  a  possible  violation  of  the  c  constraint  Then,  af^ter 
every  1  the  encoder  inserts  d  O's  to  avoid  a  possible  violation  of 
the  d  constraint 

The  decoder  also  keeps  track  of  the  variables  k'  and  c', 
using  the  values  to  delete  the  extra  bits  inserted  by  the  encoder. 

Pcifonnancc 

We  model  the  encoder  by  a  variable  length  constraint  graph 
[2,  3].  The  states  in  our  graph  represent  c' .  The  edges  in  our 
graph  represent  the  allowable  runs.  When  0<<f-KlSkS2c-l, 
we  can  represent  the  graph  for  such  a  {d,k,c)  constrained  sequence 
by  the  (2c  -d)x  (2c  -  a)  matrix 

To  0  •••  0  0  Dili  ^ 


where  the  superscripts  of  D,  and  represent  the  number  of  O's 
and  I's  respectively. 

The  capacity  of  such  a  graph  is,  C{d,k,c)  =  -log,  A ,  where 
A  is  the  smallest  positive  root  of  the  characteristic  equation 
det(l- A(z,z))  =  0. 

In  order  to  get  a  description  of  the  corresponding  data 
sequences  for  the  bit  stuff  encoder,  we  remove  the  O's  and  I's 
inserted  by  the  bit  stuff  algorithm  from  the  variable  length  graph 
described  by  A(Dg,/^),  getting  the  {2c-d)x(2c-d)  matrix 
■  0  0  -  0  0 

0  0  •••  0  dSd( 

0  0  DgDl  •••  D(-0Df  ■ 

0  DfDf  •••  Cf-'-'Df 

Bfflf  flf-'-'flf’  Dl-'lf  •••  0 


Then,  assuming  that  our  data  are  independent,  identically 
distributed,  binary  digits  with  the  probabilitv  of  a  0  given  p,  the 
sjate  transition  probability  matrix  for  the  (d,k,c)  bit  stuff  code  is 
A(p,l-p). 


For  the  (</,«>,<>»)  constraint,  maximizing  the  equation  for 
the  average  information  rate  and  setting  =  A ,  we  have  the 

parametric  equatioas  for  the  capacity  of  a  (a,ao,oo)  code. 

For  the  {d,d  + 1,°°)  constraint,  maxinuzing  the  equation  for 
the  average  information  rate  and  setting  p  =  we  have  the 
parametric  equations  for  the  capacity  of  a  {d,d  !,<><>)  code. 

For  the  (2c-2,«>,c)  constraint,  maximizing  the  equation 
for  the  average  information  rate  and  setting  p^  =  A",  we  have  the 
parametric  equations  for  the  capacity  of  a  (2c  -  2,«»,c)  code. 


Now  we  argue  that  the  remaining  cases  are  sub-optimal. 
All  bit  stuff  (d,k,c)  constrained  sequences  that  we  have  not  shown 
to  be  optimal  are  in  the  category  0<d■^l<i^■^l<2c-lS«>.  We 
consider  Ae  two  Mually  probable  data  sequences  [0]*'*[0]*‘^111 
and  [Of*  '*l[0f‘^'l010.  If  the  bit  stuff  encoder  starts  in  state 
then  it  will  end  in  state  -c  +  1  and  output  the  sequent^ 
H0]*ir0]*l{0]‘'l[0]'l[0]"l(0f  and  l[0]*-'lll[)r'l[0ni[0r^ 
Smce  tH>th  s^uences  have  the  same  starting  and  ending  state,  Mth 
sequences  have  an  identical  set  of  predecessors  and  successors. 
However,  the  output  sequence  for  the  first  data  sequence  is  longer 
than  the  output  sequence  for  the  second  data  sequence.  Therefore, 
in  order  to  maximize  the  average  information  rate,  the  probability 
of  the  first  data  sequence  must  be  less  than  the  probability  of  the 
second  data  sequence,  which  contradicts  the  fact  that  the  data 
sequences  are  equally  probable. 

Although  there  remaining  cases  are  sub-optimal,  numerical 
maximization  of  the  average  information  rate  shows  that  they  are 
nearly  optimal. 

Conclusions 

We  have  presented  a  simple  and  efficient  algorithm  for 
(d,k,c)  constraining  data.  We  have  shown  that  the  algorithm  is 
optimal  for  {d,oo,<X),  {d,d  +  l,^),  and  (2c-2,<>o,c)  constraints 
and  nearly  optimal  for  all  other  {d,k,c}  constraints. 
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A  binary  word  of  length  n  €  IN  is  called  balanced  when  it 
has  [j]  ([  jJ)  I’s  and  [  jJ  ([j])  O’s.  A  code  C  is  a  balanced 
code  with  r  check  bits  and  k  information  bits  iff; 

1.  C  has  fixed  length  n  =  it  +  r, 

2.  each  word  A  eC  is  balanced, 

3.  |C|  =  2*. 

In  [4],  Knuth  showed  that  if  a  balanced  code  with  r  check  bits 
and  k  information  bits  exists,  then  r  >  j  logj  k  +  0.326;  he  has 
designed  serial  encoding  and  both  serial  and  parallel  decoding 
schemes  with  k  =  2’’  and  k  =  2^  —  t  —  \  respectively.  In  both 
methods,  for  each  given  information  word,  some  appropriate 
number  of  bits,  starting  from  the  first  bit,  are  complemented; 
then  a  check  is  assigned  to  this  modified  information  word  to 
make  the  entire  word  balanced.  In  the  sequential  decoding  the 
check  represents  the  weight  of  the  original  information  word 
whereas  in  the  parallel  decoding  the  check  directly  indicates 
the  number  of  information  bits  complemented. 

In  [1],  [2]  and  (3]  improved  design  methods  are  given. 

In  this  paper,  we  divide  the  set  of  information  words  into 
two  subsets:  l)the  subset  of  words  that  are  close  to  balanced 
and  2)the  subset  of  words  that  are  not  close  to  balanced;  then 
we  encode  words  in  each  subset  with  different  methods.  More 
precisely,  given  f€lN,  let  (u'(A')  is  the  weight  of  X); 

U,  ='  {AgZZ*  :  0  <  w(X)  <tor  k-t<k-  u'(A')  <  it) 
and: 

B,  ='  7L\  -  U, 

be  the  subsets  of  information  words  close  to  balanced  and  not 
close  to  balanced  respectively.  The  words  are  made  balanced 
by  encoding  Ut  using  tail-maps  and  encoding  B,  using  single 
maps  defined  by  Knuth’s  complementation  method. 

Three  different  tail-maps  are  given  and  here  one  of  these 
maps  is  briefly  described. 

Tail-map  construction  1:  Here,  the  word  is  divided  into  [|] 
two  bits  and  each  part  is  encoded  into  unary  with  the  function 
u  :  TLi  U  Ej  More  formally,  given  it€lN,  let: 

A  =  X1X2I3X4  ■  ■  ■  Zk-jXk-iXk  =  Oj  . . .  . 

where: 

_,X2.  ifie[i,[|J], 

(1) 

’’This  work  is  supported  by  the  grant  from  National  Science  Foun¬ 
dation  MIP'9016143.  The  first  author's  work  is  supported  by  the 
Ittdian  National  Research  Council  (CNR  20.3.01  ..59). 


Each  can  be  considered  as  the  binary  encoding  of  an  integer 
number  between  0  and  3.  In  this  way  X  can  be  identified  by 
a  sequence  of  [j]  integer  numbers  between  0  and  3  and  so  we 
can  encode  X  using  the  unary  representation  of  such  sequence. 
Let: 

f/(A)‘‘^:'a(6f)u(if)...«(6f^^),  (2) 

and  (U'_o  S^)  U  St  the  tail-map  defined  as 

follows  (A'  is  the  complement  of  X): 

(  t/(A')0‘*-"-'“^'^»0  ifAeU'-o-^*' 

<V'>(A')1?:W  _ 

[  if  A'gU‘=^_,5,'' 

Balanced  code  construction  1:  Given  r  6  IN  ik  >  6,  it  < 
2’^**  —  2  and  t  =  i{k)  1='  [jJ,  the  encoding  scheme  is: 

1.  Encode  the  information  words  A'  €  (Ul=o^*) 

5,*^  using  the  tail-map  defined  above 

2.  Encode  the  other  information  words  using  single  maps 
defined  by  the  Knuth's  complementation  method. 

Two  other  improved  tail-maps  and  balanced  codes  based 
on  these  maps  are  also  designed  in  the  paper.  In  particu¬ 
lar  Balanced  codes  with  r  check  bits,  k  <  3  •  2*^  —  8  and 
k  <  5.2’'—  1  Or  +  Constant  (Constant  G  {  —  15,  — 10.  —5,  0,  +5} ) 
information  bits  are  designed.  In  the  first  two  cases  l.e  Tail- 
maps  can  be  conipui.ed  with  a  parallel  scheme. 
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Abstract 

Irreducible  components  of  canonical  diagrams  for  spectral  null 
constraints  at  /  =  fs^fn  are  studied,  where  k  and  n  are  rel¬ 
atively  prime  integers  with  0  <  k  <  n  «md  fs  is  the  symbol 
frequency. 

To  identify  systematically  all  irreducible  components  of  the 
canonical  diagrams  for  first-order  spectral  nulls  at  /,  we  give  a 
set  of  channel  symbol  sequences  specifying  all  of  them.  If  n  is 
a  prime  nnn^ber,  then  each  sequence  in  the  set  corresponds  to 
exactly  one  irreducible  component  up  to  label-preserving  graph 
isomorphism.  We  also  give  a  set  of  channel  symbol  sequences 
specifying  all  irreducible  components  of  canonical  diagrams  for 
second-order  spectral  nulls  at  dc  (i.e.,/  =  0). 

Introduction 

A  spectral  null  constraint  requires  that  channel  symbol  se¬ 
quences  should  have  no  frequency  content  at  a  specified  fre¬ 
quency  /.  Codes  for  the  constraint  were  characterized  in  terms 
of  finite  directed  graphs  with  labeled  edges.  Marcus  and  Siegel 

[1],  however,  have  defined  canonical  diagrams  for  first-order 
spectral  null  constraints  at  /,  which  are  countable-state  di¬ 
rected  graphs  with  labeled  edges,  and  they  also  have  shown  that 
every  finite  subdiagram  of  a  canonical  diagram  has  a  first-order 
spectral  null  at  /.  Recently,  order-if  spectral  null  constraints 
have  been  introduced  as  extensions  of  first-order  spectral  null 
constraints  and  canonical  diagrams  for  them  also  have  been 
given.  We  can  construct  a  spectral  null  code  by  choosing  an 
irreducible  finite  subdiagram  from  the  canonical  diagram  and 
by  applying  the  code  construction  schemes  given  in  [4),  (2),  [3]. 

When  we  design  a  code  in  such  a  way,  a  choice  of  an  irre¬ 
ducible  finite  subdiagram  from  the  canonical  diagram  has  an 
effect  on  the  code  we  get.  Hence,  we  must  be  able  to  identify 
all  irreducible  finite  subdiagrams  of  the  canonical  diagreun  in 
order  to  obtain  an  optimal  code  in  some  sense.  However,  canon¬ 
ical  diagrams  have  infinitely  many  states  and  configurations  of 
them  are  not  so  simple  that  we  can  understand  them  intuitively. 
Moreover,  a  canonical  diagram  consists  of  disjoint  irreducible 
countable-state  subdiagrams.  Therefore,  in  this  paper  we  in¬ 
vestigate  irreducible  components  of  a  canonical  diagram  and 
give  a  systematic  way  to  identify  all  of  them. 

We  assume  that  the  channel  symbol  alphabet  is  {i,  — i}. 
Irreducible  Components  for  First-Order 
Spectral  Null  at  / 

Let  p  be  an  integer  with  0  <  p  <  n  -  1.  Let  be 
a  countable-state-transition-diagram(CSTD)  which  is  period- 
p  canonical  for  a  spectral  null  at  /  [ij.  We  assume  that  /  >  0 
(i.e.,  n  >  2)  because  it  is  trivial  that  Gj  is  irreducible.  In  the 
case  where  n  is  prime,  we  have  identified  all  irreducible  com¬ 
ponents  of  G^. 

Let  £;  =  -  •  -1 - a}. 

fi»3  limM 

Let  Gf  be  an  irreducible  component  of  G^  which  contains  the 
state  0.  For  a  €  E,  let  //(a)  be  the  irreducible  component 


of  G^  such  that  a  is  generated  by  a  cycle  in  For  two 

CSTD’s  I  and  I',  if  there  is  a  label- preserving  graph  isomor¬ 
phism  of  7  to  /*  then  we  write  /  =  /'. 

Theorem  1  G^  is  the  union  of  I}{a),  a  6  /■'  -ind  G^. 

Theorem  2  Assume  that  n  is  prime.  Then 

•  for  every  pair  of  sequences  a,b  &  E  with  a  ^  b  we  have 

//(«)  ^ 

•  for  every  irreducible  component  7  of  G^,  we  have  I  ^  G^ 
or  7  =  If  (a)  for  some  a  €  E. 

For  every  a  =  ao---ax,-i  G  E,  7^(o)  contains  the  state 
—  exp(— ZTtv/^t/nja;.  Therefore  we  can  generate  all 
irreducible  components  of  G^  by  applying  the  st.ite  transition 
rule  for  G^  recursively. 

Irreducible  Components  for  Second- Order 
Spect/al  Null  at  DC 

We  have  identified  all  irreducible  components  of  canonical  dia¬ 
grams  for  second-order  spectral  nulls  at  dc. 

Let  p  be  a  positive  integer.  Let  G^**  be  a  CSTD  which  is 
period-p  canonical  for  a  second-order  spectral  null  at  dc  [5]. 
The  set  of  states  in  G^**  is  Z  x  Z,  where  Z  is  the  set  of  inte¬ 
gers.  Let  cr  be  a  state  in  G^^’.  Define  a  diagram  Lf{tT)  to  be  a 
subdiagrara  of  G^^^  which  consists  of  all  states  r  (and  all  edges 
connected  to  those  states)  such  that  there  are  paths  from  r  to 
<T  or  paths  from  ir  to  t.  Then 

Proposition  1  For  every  state  <t  in  G^*’  Lp(o')  is  irreducible. 

Theorem  3  For  every  irreducible  componet  7  of  Gj,’*,  we  hvave 
7  5?  Lf{{i,0))  for  some  i  with  0  <  »  <  p  —  1- 

Thus  we  can  generate  all  irreducible  components  of  G^**  by 
applying  the  state  transition  rule  of  G^*’  recursively  to  states 
(0,0),(l,0),...,(p-l,0). 
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In  a  conservative  code  of  dimension  n,  every  word  has  Ln/2J 
transitions.  A  transition  occurs  when  two  adjacent  bits  are  the 
complements  of  each  other.  A  (1  — >  0)  and  (0  —  1)  transitions 
are  treated  indistinguishable 

Conservative  codes  can  be  used  for  bit  synchronization  in  a 
high-speed  communication  channels  [1].  N on-systematic  con¬ 
servative  codes  were  introduced  and  analyzed  in  [1].  This  paper 
gives  efficient  constructions  of  conservative  codes  which  have 
the  same  efficiency  as  the  balanced  codes  reported  in  [3]. 
Notation: 

W(x):  the  Hamming  weight  of  the  binary  vector  x; 

w(x)  =  = 

NT(x):  the  number  of  transitions  in  x,  NT(x)  = 

nrji'  ®  *•+!• 

N(n,  ():  the  number  of  n-bit  vectors  with  i  transitions,  i.e. 
N(n,<)  =  |{xe{0,l}"|NT{x)=t}|. 

For  instance,  if  x  =  1101,  then  W(x)  =  3  and 

NT(x)  =  2.  To  find  N(4, 2)  we  see  that  only  the  words 
0010,  0100,  0110,  1001,  1011,  and  1101  have  2  transitions; 
therefore,  N(4,2)  =  6. 

Definition  1:  Let  x  =  ii-’ 'Zn  be  a  binary  vector,  then 
let  x^-**  denote  x  where  the  first  j  even  bits  are  complemented; 

i.e.  X*'''  =  ziZjZ3Z4  •  •  •  Z2,_i X2jX2j+i  •  •  •  Zn.  For  instance,  if 
X  =  1000  1101  then  x<^' =  1101  1001.  Q 

Properties  of  x^-’l: 

1.  NT(x^-'')  =  NT(x^-'~'')  -f  «  for  1  <  j  <  ["/21.  where  «  = 
0,2,  or  -2  and  NT(xl^'*/*J')  =  n  -  1  -  NT(x) 

2.  For  any  integer  t,  of  the  same  parity  as  NT(x)  (i.e. 
t  =  NT(x)  mod 2),  where  min(NT(x),  n  —  1  —  NT(x)) 
<  i<  max(NT(x),n  —  1  —  NT(x)),  there  exist  a  j  such 
that  NT(x^-’*)  =  t  (where  0  <  j  <  [n/2]). 

Given  that  NT(x*l-"^^-*')  =  n  — 1  — NT(x)  and  that  NT(x*-'*)  = 
NT(x*''~'')  +  i  where  «  =  0,  2,  or  —2,  we  see  that  every  integer 
of  the  same  parity  as  NT(x)  in  the  range  min^NT(x),n  —  1  — 
NT(x))  <  1  <  max(NT(x),  Tt  —  1  —  NT(x))  is  obtained  after  the 
complementation  of  some  even  bits.  D 

Code  Design:  Before  starting  the  construction,  one  more 
definition  is  required. 

Definition  2:  A  complement  ch'-rk  (c-check),  C  consists 
of  two  r-bit  vectors  and  positive  or  negative  sign  as  follows: 
C  —  {co,  C]  where  Co  =  Ci  and  the  left  most  bit  of  co  is  0 
(and  the  left  most  bit  of  ci  is  1).  Furthermore,  let 

NTfCl  =  /  NT(co)  if  C  has  a  negative  sign 
NT(co)  -f-  1  if  C  has  a  positive  sign 

The  set  {0, 1}’’  of  check  symbols  are  grouped  into  2''  c-checks. 
For  example,  the  c-checks  obtained  from  {0, 1}*  are  {00,  ll}“, 
{00,11}  +  ,  {01,  10}-,  {01,  10}  +  . 

‘’This  work  is  supported  by  the  grant  from  KFUPM  and  National 
Science  Fotmdation  MIP-9016143. 


Figure  1;  Conservative  code  structure  of  Construction  I. 


I  XlXl  -  Xlc  I 

information  word 


XJX2X3  ■  ■ ■ XjXj+l  ■ ■  ■  Xk 


Cl 


code  word 


These  c-checks  will  be  used  to  construct  a  conservative  code. 
When  a  c-check  is  appended  to  an  information  vector,  it  is  pos¬ 
sible  to  generate  (or  not  to  generate)  a  transition  between  the 
information  vector  and  the  c-check  depending  on  whether  Co 
or  Cl  was  appended.  A  c-check  carries  a  positive  (or  negative) 
sign  depending  on  whether  it  is  always  used  to  generate  (or  not 
to  generate)  a  transition  at  the  time  of  encoding. 

Let  Jit  denote  the  set  of  information  vectors  with  t  transi¬ 
tions;  i.e.  Rt  =  {x  6  {0, 1}*  1  NT(x)  =  t}. 

The  encoding  and  decoding  process  is  captured  in  the  following 
notation: 

C  ::  Rt,UJi,,U  -U  Ri,  —  R,. 

This  notation  means  that  the  information  words  wit  li  li-  h-  ■  ■  ■  ■ 
and  tq  transitions  are  transformed  to  some  t-bit  vector  with 
1  transitions  by  complementing  some  even  liits.  One  of  tlie 
two  check  symbols  in  C  will  then  be  appended  to  the  right- 
end  of  the  transformed  word  to  obtain  the  final  code  word,  as 
depicted  in  fig.  1.  The  vector  Co  is  appended  if  zjt  =  0  and 
C  has  a  negative  sign  or  z*  =  1  and  C  has  a  positive  sign: 
otherwise,  Ci  is  appended.  This  allows  the  creation  of  one 
(or  zero)  transition  when  C  has  a  positive  (or  negative)  sign. 
To  obtain  a  conservative  code,  the  final  code  word  must  have 
[(k  -f  r)/2j  transitions;  therefore,  all  maps  must  satisfy 


NT(C)-t-t=  [(l:-t-r)/2j.  (1) 


In  decoding,  the  check  symbol  and  the  presence  (or  absence) 
of  a  transition  between  the  information  and  the  check  are  used 
to  obtain  the  original  information  vector.  If  yc  is  the  received 
code  word,  then  c  and  the  presence/absence  of  a  transition 
between  y  and  c  will  uniquely  determine  the  c-check  (C)  and 
hence  its  associated  map,  say  C  ::  R|^  U  •  ■  •  U  R,^  —  Rt.  The 
even  bits  of  y  will  then  be  complemented  until  NT(y^-'')  is  equal 
to  <1 ,  >2,  ■  •  ■ ,  or  tq. 

Based  on  these  concepts,  codes  with  k  up  to  2'^+'  —  r  —  1. 
using  r  check  bits  are  designed. 
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Abstract 

In  a  decentralized  detection  scheme,  we  consider  adaptive 
Order  Statistic  (OS)  thresholding  for  local  decisions:  each 
local  detection  threshold  is  a  linear  combination  of  ranked 
samples  from  the  reference  window  centered  on  the  spaticil 
cell  being  probed.  Letting  the  vector  of  coefficients  of 
the  linear  combination  at  the  m-th  node,  m  = 
how  should  the  _c„,’s  be  chosen  in  order  to  maximize  the 
overall  power  of  a  given  fusion  rule  for  a  fixed  overall  type-I 
error  probability?  Assuming  exponential  observations,  we 
solve  this  problem  for  AND  and  OR  fusion  rules,  and  we 
compare  the  respective  performance. 

Problem  formulation 

Decentralized  detection  is  based  on  the  concept  of  data  fu' 
sion:  local  decisions  taken  at  spatially  separated  remote 
sensors  are  transmitted  to  a  central  processor  to  make 
a  binary  decision  about  the  state  of  the  sensed  environ¬ 
ment.  In  practical  situations,  the  environment  is  sequen¬ 
tially  scanned  in  time-space  on  a  cell-by-cell  basb,  and  a 
decision  is  made  for  each  cell.  Adaptive  processing  is  re¬ 
quired  to  track  various  changes  of  the  disturbance.  Exam¬ 
ples  are  presented  in  [1,2]:  a  common  model  of  disturbance 
is  assumed  at  each  sensor,  and  adaptation  is  accomplished 
by  estimating  a  distributional  parameter  based  on  a  refer¬ 
ence  set  of  cells  surrounding  the  cell  under  test. 

Estimation  procedures  based  on  Order  Statistics  (OS) 
are  preferrable  in  regard  to  robustness  against  possible 
non-homogeneities  in  the  reference  set.  We  assume  that 
the  local  estimates  of  the  disturbance  activity  are  achieved 
as  linear  combinations  of  OS’s,  and  that  each  local  decision 
is  b2ised  on  the  logical  variable 

fhn  =  U  (^Zm  -  ,  m=l,...,Af, 

where  u(  )  is  the  unit-step  function,  Zm  is  the  observation 
from  the  cell  being  tested  available  at  the  m— th  sensor, 
is  a  ranked  version  of  the  Nm  samples  from  the  refer¬ 
ence  set  of  the  m— th  sensor,  7m  is  a  threshold  coefficient 
determining  the  local  type-I  error  probability,  _Cm  is  a  set 
of  coefficients  of  size  Nm  (if  censoring  is  adopted,  the  last 
rm  entries  of  fo  zero).  The  final  decision  is 

taken  according  to  a  preassigned  fusion  rule  of  the  hm’s, 
e.g.  AND,  OR,  which  determines  the  global  performance. 

Optimisation  of  the  local  disturbance  estimators  is  of 
crucial  importance  if  the  inherent  detection  loss  due  to 
the  adaptive  processing  is  to  be  kept  at  a  minimum.  This 
requires  a  statistical  model  for  the  observables. 

We  assume  that  the  sample  from  the  cell  being  tested 
is  exponential  with  parameter  <Tni,  (alternative  hypothesis) 
or  i7mo  (null  hypothesis),  while  the  samples  in  the  reference 
sets  are  independent  exponential  variates  with  parameter 


(Tmo-  loccd  error  probabilities  are 


JVm- 


Ofwi  — 


n 


1  +  ymCmh  ’ 


^  n 


h=l 


1  A. 

l+5„ 


where  Cmk  =  +  1  -  h)  and  where 

maximize,  with  respect  to 
_Cmt  overall  power  1  —  jS  for  constrsdned  a. 


AND  fusion  rule 

The  global  performance  reduces  to  mere  products  of  local 
ttm’s  and  /dm’s.  Lagrangian  maximization  then  yields  the 
optimum  _c„,  in  terms  of  products  7mCmfc,  that  are  solu¬ 
tions  to  the  system  of  equations 

f  +  7lCll  _  I  +  7mCmh 
1  +  +  7lCll  1  -I-  Sm  +  7mCmh 

For  constant  Sm,  say  Sm  =  S,  the  solution  is  ymCmh  =  A, 
say,  and  the  optimum  performsmce  is 

with  Neq  =  That  is,  the  same  performance 

of  a  single  detector  whose  threshold  is  adapted  through  a 
minimum  variance  estimate  based  on  a  reference  set  of  Ngq 
i.i.d.  exponential  variates. 


OR  fusion  rule 

The  global  performance  can  stUl  be  expressed  in  simple 
terms  of  the  a^’s  and  the  0m's,  but  the  equations  for  the 
optimum  coefficients  can  only  be  solved  through  numericfil 
techniques.  A  suboptimal  approach  leading  to  analytical 
solution  is  to  minimize  the  individual  13m  with  respect  to 
the  Cmh’s  (whence  Cmh  =  l/(Nm  -  ’’m)),  and  subsequently 
to  minimize  p  with  respect  to  the  ym’s.  In  particular, 
assuming  Sm  =  S  and  Nm  -rm  =  L  leads  to 

7™  =  I[(l-(l-a)'/")-^/^-l]=7, 
say,  whence, 

/?  =  [1  -  (1  -t-  S)*/(l  -I-  S  -t-  7/i)^]"  . 
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Abstract 

The  performance  of  a  parallel  distributed  detection  system  is  in¬ 
vestigated  as  the  number  of  sensors  tends  to  infinity.  It  is  assumed 
that  the  i.i.d.  sensor  data  are  quantized  locally  into  m-ary  messages 
and  transmitted  to  the  fusion  center  for  Bayesian  binary  hypothe¬ 
sis  testing.  Large  deviations  techniques  are  employed  to  show  that 
the  equivalence  of  absolutely  optimal  and  best  identical-quantizer 
systems  is  not  limited  to  error  exponents,  but  extends  to  the  actual 
Bayes  error  probabilities  up  to  a  multiplicative  constant.  This  is  true 
as  long  as  the  two  hypotheses  are  mutually  absolutely  continuous;  no 
further  assumptions,  such  as  boundedness  of  second  moments  of  the 
post-quantization  log-likelihood  ratio,  are  needed. 

Summary 

Consider  a  parallel  distributed  system  consisting  of  n  geographi¬ 
cally  dispersed  sensors,  noiseless  one-way  communication  links,  and  a 
fusion  center.  Each  sensor  makes  an  observation  (denoted  by  Yi)  of  a 
random  source,  quantizes  Yi  into  an  m-ary  message  Ui  =  gi{Yi),  and 
then  transmits  Ui  to  the  fusion  center.  Upon  receipt  of  (Ui, . . . ,  Un), 
the  fusion  center  performs  a  binary  hypothesis  test  (Ho  against  Hi) 
about  the  nature  of  the  random  source.  A  Bayesian  setup  is  as¬ 
sumed  throughout,  and  the  Bayes  error  probability  is  denoted  by 
Tiles’),  where  x  is  the  prior  probability  of  Ho¬ 
lt  was  shown  by  Tsitsiklis  [1]  that  even  when  the  observations  are 
i.i.d.,  the  optimal  m-ary  quantizers  gi  need  not  be  identical.  Thus  the 
absolutely  optimal  system  (*)  does  not,  in  general,  coincide  with  the 
best  identical-quantizer  system  (*).  Since  the  latter  is  much  easier 
to  design  than  the  former,  it  is  natural  to  seek  an  estimate  of  the 
performance  loss  resulting  from  using  identical  quantizers. 

Tsitsiklis  supplied  a  result  of  this  type  in  the  i.i.d.  case  by  show¬ 
ing,  under  a  fairly  general  assumption,  that  the  two  systems  are 
asymptotically  exponentially  equivalent.  More  precisely,  if  P  and  Q 
are  mutually  absolutely  continuous  distributions  of  the  i.i.d.  obser¬ 
vations  under  Ho  and  Hi  respectively,  then 

(i.e.,  the  two  error  exponents  coincide)  provided  that  the  second 
moments — under  P  and  Q — of  the  post-qu2Lntization  log-likelihood 
ratio  ^og\Pi(U)/Qf(U)]  are  bounded  as  the  quantizer  mapping  g 
varies.  The  optimal  error  exponent  is  the  supremum  (over  g)  of  the 
Chernoff  exponent  associated  with  the  m-ary  post-quantization  dis¬ 
tributions  Pg  and  Qg.  It  has  also  been  shown  that  this  supremum  is 
achieved  by  a  g*  taken  from  the  class  of  deterministic  likelihood-ratio 
quantizers;  and  that  such  quantizers  are  optimal  in  the  nonasymp- 
totic  (fixed  n)  setting. 

Two  questions  arose  from  Tsitsiklis’  work: 

1.  Is  the  aforementioned  boundedness  assumption  really  neces¬ 
sary? 

2.  Does  the  nonnegative  quantity  log[7^(r)/y’(r)]  admit  an  up¬ 
per  bound  tighter  than  0(n)  (implied  by  the  equality  of  error 
exponents)? 

'Raeuch  npported  by  tb«  Inslititc  for  Syttems  Research  (a  Natioaal  Science 
Foundation  Engineering  Research  Center)  at  the  Uairetaity  of  Maryland,  College 
Park. 


Tsitsiklis  [2]  conjectured  that  the  answer  to  the  first  question 
is  negative,  and  is  this  paper  we  pve  proof  to  his  conjecture.  As 
to  the  second  question,  we  show  that  the  upper  bound  0(n)  on 
l°8(7n(’^)/7n(*')l  ^  tightened  to  0(1),  hence  the  ratio  7j(x)/7j(x) 

is  bounded  trom  above  (trivially,  it  is  also  lower-bounded  by  unity). 
We  therefore  have: 


Theorem  1/  P  and  Q  are  mutually  absolutely  continuous,  then  for 
allit€  (0, 1), 


limsup 


7;(>) 

7;(») 


<  00  , 


We  employ  large  deviations  techniques  for  proving  this  theorem. 
Using  a  refinement  (due  to  Esseen)  of  the  central  limit  theorem  for 
independent  but  not  identically  distributed  summands,  we  show  the 
following:  if  all  quantizers  in  the  optimal  system  are  “tegular,”  in 
that  they  yield — at  their  output — log-likelihood  ratios  that  satisfy 
certain  uniform  boundedness  constraints,  then  the  Bayes  error  prob¬ 
ability  7;(x)  can  be  lower-bounded  by  cn"*/*exp{-p,Bn},  where  pn 
is  the  optimal  error  exponent.  The  same  expression — only  with  a 
larger  value  of  c — is  also  an  upper  bound  on  7k(x),  so  the  conclu¬ 
sion  that  7n(*')/7n(*^)  bounded  from  above  is  close  at  hand.  It 
remains  to  show  that  the  number  of  “irregular”  quantizers  in  the 
optimal  system  is  bounded.  As  it  turns  out,  these  quantizers  yidd 
an  error  exponent  smaller  than  pm,  nod  thus  henristically,  they  can 
only  exist  in  small  numbers.  We  give  rigorous  proof  to  this  fact  using 
a  technical  argument  based  on  conditioning. 

Our  simulations  of  Bayesian  distributed  detection  have  shown 
that  the  ratio  7n(x)/7*(r)  is  in  many  instances  close  to  unity.  It  is 
quite  possible  that  under  conditions  as  yet  unknown  to  ns,  the  ratio 
7n(^)/7n(^)  tends  to  unity  as  n  ^proaches  infinity.  This,  however, 
is  not  true  in  general,  and  we  give  a  counterexample  in  which  the 
ratio  7n(^)/7n(^)  •*  greater  than  r  >  1  infinitdy  often  in  n. 
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Abstract 

Constant  false  tilarm  rate  (CFAR)  detection  techniques  have 
been  of  much  interest  in  radar  and  sonar  applications.  In  this 
paper  we  consider  CFAR  detection  in  a  decentralized,  multisensor 
context  and  show  some  interesting  char2u:teristics  of  cell  averaging 
(CA)  in  this  setting.  We  investigate  cases  with  observations  which 
are  dependent  from  sensor  to  sensor,  for  which  results  have  been 
lacking.  The  in-phase  and  quadrature  components  of  the  received 
narrowband  observation  at  each  sensor  consist  of  a  weak  random 
signal  in  additive  clutter  and  noise.  Each  sensor  transmits  a  single 
binary  decision  to  a  fusion  center  which  uses  either  an  AND  or  an 
OR  fusion  rule  to  develop  a  final  decision.  The  forms  of  the  best 
sensor  detector  rules  are  found  and  a  set  of  necessary  condition^ 
are  given  for  their  thresholds.  Solutions  are  obtained  for  some 
representative  cases  and  their  detection  probability  performance  is 
studied.  Their  ability  to  maintain  constant  false  alarm  probability 
in  the  face  of  clutter  edges  is  also  studied.  We  show  that  solutions 
which  use  different  fusion  rules  ;iiay  excel  for  each  of  these  dih'erent 
criteria. 

I.  Introduction 

Previous  studies  of  cell-averaging  constant  false  alarm  rate  (CA- 
CFAR)  distributed  detection  techniques  have  focused  on  cases 
with  independent  observations  [1,  2].  Here  we  investigate  some 
cases  with  dependent  observations  from  sensor  to  sensor.  For 
our  model  of  the  dependency  between  the  narrowband  returns  re¬ 
ceived  at  the  remotely  located  sensors,  we  initially  assume  the 
in-phase  and  quadrature  signal  components  of  the  returns  have  a 
jointly  Gaussian  probability  density  function  (pdf).  This  appears 
to  be  a  reasonable  extension  to  the  common  radar  return  model 
of  Swerling  type  I  target  fluctuations  often  used  for  the  single  sen¬ 
sor  case.  The  narrowband  signal  components  are  observed  in  the 
presence  of  additive  noise  and  clutter  with  the  combined  noise  and 
clutter  observations  initially  assumed  to  be  Gaussian  distributed 
and  independent  from  sensor  to  sensor.  The  power  of  the  com¬ 
bined  noise  and  clutter  observations  at  each  sensor  is  unknown 
and  so  a  CA-CFAR  scheme  is  employed  at  each  sensor. 

Due  to  the  typical  structure  of  radar  and  sonar  receivers,  our 
decisions  will  be  based  on  processing  the  envelope  of  the  ob¬ 
served  returns.  Our  signal  detection  schemes  will  be  optimized 
for  the  case  of  weak  signals  so  we  use  locally  optimum  detection 
techniques  [3].  While  we  provide  some  results  for  cases  with  N 
sensors,  due  to  the  complexity  of  the  problem  we  focus  on  the 
two-sensor  case  with  sensor  decisions  based  on  a  single  observa¬ 
tion,  augmented  with  Nj  reference  samples  taken  at  each  sensor 
j  =  1,2.  The  two  sensors  are  each  constrained  to  transmit  only 
a  single  binary  decision  to  a  fusion  center  and  we  consider  only 
non-randomized  fusion  rules;  specifically,  the  AND  and  the  OR 
fusion  rules. 

’This  material  is  based  upon  work  supported  by  the  National  Science 
Foundation  under  Grant  No.  MIP-9211298  and  by  the  Air  Force  Office  of 
ScientiKc  Research  under  Grant  90-0050 


II.  Summary  of  Results 

The  necessary  conditions  for  the  locally  optimum  (LO)  sensor 
detector  thresholds  for  either  an  AND  or  an  OR  fusion  rule  have 
been  obtained  for  N-sensor  cases.  Example  solutions  have  been 
found  for  some  representative  two-sensor  cases.  The  LO  AND  rule 
solution  was  found  to  always  use  equal  thresholds  at  each  sensor 
detector.  The  LO  OR  rule  solution  used  equed  thresholds  if  the 
number  of  reference  cells  at  each  of  the  sensors  was  the  same, 
but  different  thresholds  if  the  number  of  reference  cells  at  the  two 
sensors  was  different.  We  found  that  the  AND  rule  solution  was 
better  than  the  OR  rule  solution  for  a  range  of  small  false  alarm 
probabilities  and  that  this  range  tended  to  shrink  as  the  number 
of  reference  samples  used  at  each  sensor  was  increased. 

We  have  also  provided  some  analysis  and  results  on  the  ability 
of  our  two-sensor  distributed  CFAR  detection  systems  to  maintain 
constant  false  alarm  probability  in  the  presence  of  clutter  edges 
across  their  groups  of  reference  samples.  We  considered  the  LO 
schemes  using  the  AND  and  OR  fusion  rules  for  a  particular  false 
alarm  probability.  We  found  that  the  OR  rule  scheme  was  superior 
to  the  AND  rule  scheme  if  both  sensors  had  the  same  clutter  edge 
applied  to  them.  The  detection  probability  of  one  of  our  two¬ 
sensor  LO  distributed  CA-CFAR  detection  schemes  was  shown 
to  be  larger  thaui  that  for  a  single  sensor  CA-CFAR  scheme  for 
cases  with  a  wide  range  of  signal-to-noise  ratios  and  dependent 
observations  from  sensor  to  sensor. 

Cases  with  non-Gaussian  signals,  noise,  and  clutter  were  also 
considered  for  the  particular  case  where  the  pdf  of  the  combined 
noise  and  clutter  contains  an  unknown  scale  parameter.  We  sug¬ 
gest  schemes  which  use  sensor  tests  which  are  LO  [3]  among  all 
tests  satisfying  a  specific  constraint  which  insures  a  CFAR  test. 
This  technique  generates  tests  which  can  be  considered  a  gen¬ 
eralization  of  the  CA-CFAR  scheme  for  non-Gaussian  combined 
noise  and  clutter  pdfs  since  the  CA-CFAR  scheme  is  generated  by 
assuming  a  Gaussian  pdf  for  the  combined  noise  and  clutter  obser¬ 
vations.  The  form  of  the  CFAR  sensor  test  statistics  were  found 
for  a  specific  example  with  a  Cauchy  noise  pdf.  The  weak-signal 
performance  of  our  scheme  was  shown  to  be  better  than  that  of  a 
CA-CFAR  scheme  for  a  single  sensor  case  with  Cauchy  noise. 
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Consider  two  surveilance  systems:  one  using  the  visible-band 
satellite  (called  the  primary)  and  the  other  using  the  infrared 
(called  the  serr'ndary).  The  primary  system,  because  of  its  shorter 
wavelength,  gives  more  acurate  and  detailed  results  than  the  sec¬ 
ondary.  Yet,  it  is  more  susceptible  to  atmospheric  interference 
(e.g.,  clouds).  This  complementary  aspect  suggests  integration  of 
the  two  systems.  Performance  improvement  by  the  integration 
depends  on  how  they  are  combined  and,  even  if  the  improvement 
is  substantial,  cost  of  the  integration  may  not  justify  it.  Thus,  it 
is  desirable  to  have  some  rational  methods  of  integration  and  to 
evaluate  the  merit  of  such  integration  (i.e.,  performance  improve¬ 
ment  vs.  cost).  By  using  a  simple  example,  we  illustrate  how  this 
might  be  done. 

The  primary  system  consists  of  two-dimensional  sampled  data, 
say,  an  n  X  n  data  array,  and  a  signal  processing  algorithm  to 
detect  and  localize  a  target.  The  data  contain  a  target  signal, 
if  present,  the  usual  white  background  noise  and  localized  ran¬ 
dom  disturbances  representing  cloud  coverage.  In  the  absence  of 
clouds,  the  optimum  processor  is  the  matched  filter  and  the  de¬ 
tection  and  localization  is  done  by  maximizing  the  matched  filter 
output  with  respect  to  the  possible  target  location  and  threshold¬ 
ing  it.  When  random  clouds  appear,  they  overshadow  the  target 
and  severly  reduce  the  effective  S/N.  With  the  use  of  a  Poisson- 
distributed  cloud  model,  we  derive  the  log-likelihood  ratio  which 
consists  of  the  matched  filter  and  a  “cloud  remover”.  The  lat¬ 
ter  involves  a  linear  combination  of  exponentials  in  the  data  and 
requires  far  more  computation  than  the  former.  However,  it  sig¬ 
nificantly  reduces  the  effect  of  the  cloud.  The  secondary  system 
consists  of  a  similar  two-dimensional  data  covering  the  same  phys¬ 
ical  area  but  with  coarser  sampling,  say,  an  m  x  m  data  array  with 
m  <  n.  Unlike  the  primary  data,  the  random  disturbances  are 
assumed  negligible.  Hence,  the  optimum  processor  comprises  of 
a  matched  filter  only. 

The  simplest  integration  of  the  two  systems  is  the  “decision- 
decision”  integration  where  the  decisions  of  the  indivisual  systems 
are  combined  to  make  a  joint  decision.  If  both  systems  agree  on 
the  target  presence  and  the  location,  this  integration  is  satisfac¬ 
tory  and  the  cost  is  minimal.  If  they  disagree,  some  hierachical 
order  must  be  established  to  arrive  at  a  joint  decision.  In  the 
event  of  a  weak  target,  if  both  decide  that  there  is  no  target  in¬ 
spite  of  its  presence,  the  joint  decision  will  be  “no  target”  and 
the  target  will  be  missed.  The  most  thorough  integration  is  the 
“data-data”  integration  where  the  secondary  data  arc  combined 
with  the  primary  and  they  are  jointly  processed  to  produce  a  sin¬ 
gle  decision.  With  the  optimum  processing  this  integration  gives 
the  highest  performance  improvement,  though  the  cost  of  inte¬ 
gration  is  also  the  highest  since  the  entire  data,  instead  of  the 
decisions  only,  are  combined  and  processed.  Between  these  two, 
there  is  an  “data-decision"  integration  where  the  decision  of  the 
secondary  system  is  combined  with  the  data  of  the  primary  and 
they  are  jointly  processed  to  produce  the  final  decision.  This  rep¬ 


resents  a  compromise  between  the  performance  improvement  and 
the  additional  cost.  Such  an  integration  is  advantagenous  when 
the  secondary  system  has  high  S/N.  Then  the  detection  is  virtu¬ 
ally  done  by  the  secondary  and  its  localization  result  specifies  a 
small  region  in  the  primary  data  where  the  target  is  likely  to  be 
found.  On  the  other  hand,  if  the  secondary  S/N  is  low,  the  deci¬ 
sion  by  the  secondary  may  mislead  the  primary.  Thus  we  need  a 
criterion  for  deciding  when  and  how  to  integrate  the  two  systems. 

For  measuring  the  performance  improvement  by  integration, 
we  introduce  “db-gains”  in  detection  and  localization  (or  resolu¬ 
tion),  which  is  the  gain  in  the  equivalent  S/N  measured  in  db.  The 
equivalent  S/N  for  detection  is  defined  as  the  S/N  required  by  the 
matched  filter  to  achieve  the  same  detection  probability  (Pd)  in 
the  absence  of  clouds  given  the  same  false-alarm  probability  (P/). 
The  equivalent  S/N  for  resolution  is  defined  as  follows:Ideally,  Pd 
at  the  teu'get  location  should  be  1.0  and  it  should  drop  to  the  pre- 
^lssigned  P/  value  everywhere  else.  Cedculate  the  average  slope  of 
this  peak  (Pd  =  1.0)  with  respect  to  everywhere  else  (Pd  =  Pj) 
and  use  this  as  a  reference.  Define  the  global  resolution  as  a  ra¬ 
tio  of  the  average  slope  calculated  from  the  actual  values  of  Pd 
at  all  locations  to  this  ideal  average  slope.  There  are  two  types 
of  cost  incurred  by  integration:  the  one  which  increases  with  the 
size  of  data  and  the  other  which  is  constant,  such  as  fixed  over¬ 
head  cost.  We  divide  the  first  into  the  cost  of  additional  storage 
space  and  the  cost  of  additional  computation  load.  As  unit-free 
relative  figures,  we  choose  the  ratio  of  the  cost  with  the  integra¬ 
tion  to  the  cost  without  the  integration  on  all  three  elements: 
storage  space,  computation  load  and  overhead.  Then  we  average 
these  three  ratios  with  appropriate  weighting.  For  example,  in 
the  decision-decision  integration  the  additional  space  is  for  the 
information  on  the  location  of  a  target  obtained  by  the  secondary 
system  and  the  additio^ull  computation  load  is  for  combining  this 
information  with  a  similar  one  obtained  by  the  primary.  This 
may  be  equivalent  to  the  space  and  time  needed  for  handling  one 
data  point.  In  the  data-data  integration,  however,  the  additional 
space  is  for  the  entire  secondary  data  which  are  m  x  m  =  np 
data  points.  Thus,  the  space  ratio  is  (n*  -f  m’  -I-  tP)/(tP  +  np). 
On  the  other  hand,  the  computation  load  is  not  increased  since 
the  matched  filtering  on  the  secondary  data  is  done  anyway  with¬ 
out  the  integration.  In  the  case  of  the  data-decision  integration, 
the  space  ratio  is  about  the  same  as  in  the  case  of  the  decision- 
decision  integration,  but  the  time  ratio  actually  decreases  by  the 
factor  ((n/m)*  -t-  m*j/(n*  -I-  m*)  since  only  the  section  of  the  pri¬ 
mary  data  corresponding  to  the  declared  target  location  in  the 
secondary  data  is  processed. 

As  the  figure  of  merit  which  evaluates  the  relative  merit  of 
integration,  we  propose  the  ratio  of  the  db-gain  to  the  average 
cost  for  both  detection  and  resolution.  In  the  paper,  this  ra¬ 
tio  is  numeric^dIy  evaluated  (via  Monte  Carlo  simulation)  for  six 
detection-localization  algorithms  utilizing  the  data-data  and  the 
data-decision  integrations  for  various  values  of  parameters  (e.g., 
occurrence  probability  of  cloud,  attenuation  constant  of  cloud  and 
reflected- light  intensity  from  cloud). 
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Summary.  There  are  many  instances  in  detector  design  where 
one  can  not  implement  the  minimum  probability  of  error  detec¬ 
tor  for  deciding  between  measures  Po  and  Pt.  As  such,  there 
has  been  increasing  interest  in  the  development  of  design  tech¬ 
niques  for  determining  “good”  suboptimal  detection  strategies. 

One  effective  approach  to  designing  detection  strategies 
has  been  the  optimization  of  a  statistical  distance  measure  be¬ 
tween  competing  hypotheses.  The  advantage  of  this  approach 
over  minimizing  the  total  probability  of  error  is  that  these  dis¬ 
tance  measures  can  be  computed  while  for  most  problems,  the 
minimum  error  rate  is  analytically  intractable.  Unfortunately, 
system  parameters  derived  in  this  manner  are  not  necessarily 
optimal  in  the  minimum  probability  of  error  sense.  To  ad¬ 
dress  this  issue,  we  develop  sufRcient  conditions  under  which 
solutions  obtained  by  optimizing  arbitrary  distance  measures 
results  in  the  minimum  probability  of  error  detector  over  a 
chosen  class  of  detectors. 

We  restrict  our  attention  to  the  general  class  of  mea¬ 
sures  of  discrimination  between  probability  measures  Po  and 
Pt  known  as  f-divergences  [3]  or  Ali-Silvey  distances  (Ij.  Math¬ 
ematically,  these  distance  measures  are  ^ven  by 

where  Eo  indicates  that  the  expectation  is  taken  with  respect 
to  Po  and  where  C(-)  is  a  convex  real  function  and  h(-)  is  an 
increasing  real  function  of  a  real  variable.  It  is  well  established 
that  many  well  known  measures  of  discrimination  including  the 
J-divergence,  the  Battacharyya  distance,  the  Kullback-Leibler 
distance,  and  Kolmogorov’s  measure  of  variational  distance  are 
elements  of  this  class. 

Relationships  between  the  minimum  probability  of  error 
lot  deciding  between  Po  and  Pi  and  various  f-divergences  have 
been  studied  at  great  length  [1,2,4].  Most  of  this  work  has 
focused  on  developing  bounds  for  the  minimum  probability  of 
error  in  terms  of  several  f-divergences.  Other  work  has  centered 
on  utilizing  these  relationships  to  optimize  communication  sys¬ 
tems  with  respect  to  specific  distance  measures  rather  than  the 
less  analytically  tractable  probability  of  error  [5, 6).  However, 
there  has  been  no  work  to  our  knowledge  on  studying  the  re¬ 
lationship  between  the  probability  of  error  for  suboptimal  de¬ 
tection  strategies  and  f-divergences. 

While  it  is  well  known  that  the  minimum  probability  of 
error  between  Po  and  Pt  is  a  f-divergence  between  these  mea¬ 
sures,  we  show  that  the  probability  of  error  derived  from  any 
decision  strategy  not  equivalent  to  the  likelihood  ratio  test  is 
not  equivalent  to  a  f-divergence  (i.e.,  not  a  measure  of  dis¬ 
crimination  between  Po  and  Pt).  This  implies  that  designing 

’Supported  in  part  by  the  National  Science  Foundation  under 
Grant  NCR-9109858  and  by  Rome  Laboratories  under  contract 
F306O3-93-C-0OS3. 

^Supported  in  part  by  Rome  Laboratories  under  contract  F30603- 
92-C-0053. 


suboptimnm  detection  strategies  by  maximizing  the  distance 
between  the  input  statistics  may  not  only  be  inappropriate  but 
may  also  lead  to  inconsistent  solutioiu. 

This  result  seems  to  suggest  that  f-divergences  have  lim¬ 
ited  applicability  to  the  design  of  suboptimal  detectors.  How¬ 
ever,  we  demonstrate  a  direct  linkage  in  the  performance  of 
suboptimal  detectors  and  {-divergences  through  a  form  of  the 
data  processing  theorem.  Spedfically,  we  show  that  the  loss 
in  performance  of  any  suboptimal  detector  over  the  minimum 
probability  of  error  detector  is  bounded  below  by  the  loss  of 
“information*  across  the  detector  as  determined  by  a  specific 
f-divergence.  As  such,  when  this  lower  bound  hrdds  with  equal¬ 
ity,  minimizing  the  loss  of  information  with  respect  to  this  spe¬ 
cific  {-divergence  results  in  the  minimum  probability  of  error 
solution. 

Unfortunately,  the  f-divergence  in  the  lower  bound  is  just 
as  difficult  to  compute  as  the  probability  of  error.  Thus,  we 
develop  sufficient  conditions  under  which  all  f-divergences  have 
a  common  extremum  over  a  class  of  probaMlity  measures.  The 
significance  of  this  result  to  the  problem  of  detector  design 
is  that  one  may  manmize  the  most  analytically  tractable  f- 
divergence  between  the  statistics  of  the  detector  to  minimise 
the  lower  bound  on  the  performance  loss  of  the  detector.  Most 
importantly,  when  this  lower  bound  holds  with  equality,  the 
resulting  solution  minimizes  the  probability  of  error  over  the 
class  of  allowable  detectors.  To  demonstrate  the  applicability 
of  this  theory,  we  determine  the  optimal  linear  detector  in  the 
presence  of  a  specific  non-Ganssian  noise  and  show  in  fact  that 
the  matched  filter  is  not  always  the  “best”  linear  receiver.  In  a 
second  example,  we  ^ply  this  theory  to  the  problem  of  signal 
design  and  determine  the  optimal  signal  set  in  the  presence  of 
a  specific  non-Ganssian  noise. 
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ABSTRACT 

The  number  of  samples  required  for  signal  detection  is  considered 
as  a  function  of  the  error  probabilities.  This  problem  is  treated  in  the 
context  of  detecting  a  constant  signal  in  additive,  independent  and 
identically  distributed  noise.  Detectors  that  base  their  decisions  on 
the  comparison  with  a  threshold  of  accumulated,  nonlinearly  trans¬ 
formed  observations  are  treated.  Asymptotic  expressions  are  derived 
for  the  relationship  between  sample  size  and  error  probabilities  for  this 
model  in  two  situations:  that  in  which  the  nonlinearity  has  a  partially 
absolutely  continuous  output  distribution;  and  that  in  which  it  has 
a  lattice  output  distribution.  Traditional  analyses  of  such  problems 
have  involved  only  the  lowest-order  terms  of  such  relationships  (i.e., 
central  limit  results),  leading  to  performance  indices  such  as  the  Pit¬ 
man  asymptotic  relative  efficiency  (ARE).  Such  indices  are  known  to 
be  of  limited  accuracy  in  predicting  performance  for  more  moderate 
sample  sizes.  Here,  the  behavior  of  sample  size  as  a  function  of  error 
probabilities  is  considered  in  more  detail,  leading  to  more  accurate 
indices  of  relative  efficiencies  for  such  detection  problems.  Several 
specific  examples  are  examined  in  detail,  and  numerical  results  are 
included  to  illustrate  the  significantly  improved  performance  estima¬ 
tion  afforded  thereby  for  even  small  sample  sizes. 

Introduction  and  Overview 

In  this  paper,  we  consider  the  following  pair  of  statistical  hypotheses 
concerning  a  set  zi,  ...,Zn  of  random  observations: 

Ro  :  **  ={*.  i  =  1,. .  .,n 

versus 

/fl  :  X*  =  ir  =  1,..  .,n 

where  is  a  sequence  of  independent  and  identically  distributed 
(i.i.d.)  random  variables  (r.v.’s)  with  marginal  probability  density 
function  /.  In  order  to  test  between  these  hypotheses,  we  consider 
threshold  tests  based  on  statistics  of  the  form 

T\ 

k=l 

where  g  is  a  measurable  real-valued  function. 

A  traditional  way  of  comparing  two  detectors  that  operate  in  this 
way  is  to  consider  the  relative  sample  sizes  they  require  to  achieve  the 
same  performance  in  terms  of  the  false-alarm  and  miss  probabilities. 
These  required  sample  sizes  are  usually  estimated  through  the  use  of 
the  central  limit  theorem  (CLT)  in  describing  the  behavior  of  the  test 
statistics  7^"I.  Such  comparisons  are  conventionally  made  in  terms 
of  the  asymptotic  value  of  the  ratio  between  required  sample  sizes, 
in  the  limits  as  B  approaches  zero  at  an  appropriate  rate  (see,  e.g., 
Poor  [1]).  With  fixed  error  probabilities  this  limit  forces  the  sample 
size  to  infinity,  and  the  corresponding  limiting  ratio  is  the  (Pitman) 
asymptotic  relative  efficiency  (ARE). 

There  are  several  practically  interesting  noise  densities  /  (Gaus¬ 
sian,  Laplacian,  sech)  and  detection  functions  g  (linear,  signum,  dead- 
zone)  for  which  it  is  possible  to  calculate  exact  values  of  necessary 
sample  sizes.  Studies  [2-6]  of  such  cases  have  shown  that  the  ARE 

’This  nseuch  was  sapported  by  the  U.  S.  National  Science  Foundation  under 
Grant  Nt;R-90-027«7. 


gives  a  quite  good  approximation  (within  let  us  say  10%)  to  the  ac¬ 
tual  relative  efficiency  (RE)  of  two  detectors,  when  the  sample  size  is 
rather  large  (n  >  10®).  For  moderate  sample  sizes  (say,  n  =  20—  100) 
the  ARE  is  much  less  accurate  (within  50%),  but  it  is  still  reasonable. 

In  order  to  better  approximate  RE  for  moderate  sample  sizes  it 
is  quite  natural  to  consider  refinements  of  the  CLT  using  asymptotic 
expansions.  The  applicability  of  such  asymptotic  expansions  in  re¬ 
lated  problems  was  shown  in  pioneering  work  of  Cramer  (see,  e  g., 
[7]).  However,  this  approach  has  not  been  developed  in  the  context  of 
signal  detection,  although  alternative  intermediate  estimates  for  RE 
(without  asymptotic  expansions  in  the  CLT)  have  been  considered  in 
[4,  8). 

The  contribution  of  this  paper  is  to  develop,  for  a  given  error  prob¬ 
abilities  a  and  13,  detector  function  g  and  signal  strength  6,  asymp¬ 
totic  expansions  for  a  necessary  sample  size  n[a,  ^,g,9),  as  9  — •  0'*’, 
through  the  use  of  asymptotic  expansions  in  the  CLT,  and  to  explore 
the  accuracy  of  approximations  based  on  these  expansions.  The  pre¬ 
sentation  of  these  developments  is  organized  as  follows.  First,  we 
provide  a  brief  review  of  relevant  results  on  the  asymptotic  expansion 
of  distributions  of  sums  of  i.i.d.  random  variables.  Then,  we  develop 
the  desired  expansions  for  sample  size  for  two  basic  cases:  that  in 
which  the  distribution  of  s(x*)  is  of  “density”  type  (meaning  that 
its  characteristic  function  converges  to  a  value  less  than  unity  with 
increasing  argument),  and  that  in  which  the  distribution  of  g{xk)  is 
of  lattice  type.  Finally,  we  consider  several  specific  examples  and 
illustrate  the  accuracy  of  the  developed  expansions  numerically. 
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Locally  optimal  detection  of  a  weak,  known  signal  in  inde¬ 
pendent,  identically  distributed  (i.i.d.)  noise  with  known  proba¬ 
bility  density  function  involves  a  nonlinear  correlator  —  a  mem¬ 
oryless  nonlinearity  that  depends  on  the  noise  density  function 
followed  by  a  correlator  [1].  In  practice,  the  noise  distribution 
may  not  be  known  precisely;  but,  several  moments  of  the  noise 
may  be  known.  We  determine  a  robust  detector  using  a  limited 
number  of  moments  that  describe  the  e-contaminated  noise. 

A  unique  robust  maximin  efficacy  detector  has  been  found  for 
a  nonlinear  correlator  and  an  e-contaminated  noise  model  when 
e  and  the  nominal  noise  density  are  completely  specified.  The 
least-favorable  density  for  a  strongly  unimodal  nominal  density  [2] 
and  for  a  nonstrongly  unimodal  nominal  density  [3]  were  obtained 
from  the  efficacy  saddle-point  property.  For  both  the  strongly  and 
nonstrongly  unimodal  nominal  density  noise  models,  the  robust 
nonlinearity  depends  on  the  least- favorable  noise  density  in  the 
same  way  that  the  locally  optimal  nonlinearity  depends  on  the 
assumed  known  noise  density.  In  contrast,  we  assume  a  paramet¬ 
ric  form  for  the  nominal  density  of  a  mixture  model  and  find  the 
parameters  that  yield  maximin  efficacy,  given  a  limited  number 
of  moments  of  the  noise. 

The  problem  is  modeled  as  deciding  between  the  null  hy¬ 
pothesis  X  =  N  and  the  alternative  hypothesis  X  =  fls  -I-  N 
where  X  is  a  n-element  random  observation  vector,  N  is  a  vector 
of  i.i.d.  noise  random  variables  with  univariate  density  /,  s  is 
a  vector  of  a  known  signal  with  finite,  nonzero  power,  0  = 
for  some  unknown  K  >  0,  and  n  is  the  number  of  samples.  Let 
the  set  of  admissible  noise  densities  be  the  absolutely  continuous, 
e-contaminated  densities  T  =  {/| /(i)  =  (1  —  e)^<,(i) -I- e/i(i)} 
where  ga  is  an  even  symmetric,  nonstrongly  unimodal  density 
with  unknown  parameter  vector  a,  h  is  any  even  symmetric  den¬ 
sity  from  the  convex  class  defined  in  [3],  and  e  is  the  unknown 
contamination  parameter.  Let  ♦  be  the  set  of  nonlinearities  with 
derivatives  almost  everywhere  with  respect  to  Lebesgue  measure. 
The  robust  maximin  nonlinear  correlator,  in  terms  of  efficacy 
j/(V’, /),  is  at  the  saddle-point  (V’o,/o)  that  satisfies 

maxT;{V>,/o)  =  min»j(V’o,/)  (1) 

where  is  the  robust  nonlinearity  and  /„  is  the  least  favorable 
density  in  terms  of  efficacy.  From  the  saddle-point  property  in 
equation  (1)  and  the  Cauchy-Schwartz  inequality,  the  robust  non¬ 
linearity  is  the  locally  optimal  nonlinearity  0o  =  — /„/ /o,  and  the 
least  favorable  density  minimizes  the  Fisher  information  and  is 
given  by  [3] 

(1 -e)s„(a)exp[-fc(|zl-a)],  a  <  |i|  <  6 
(1— <)So(a;),  otherwise 

where  k  =  —g'a{o)lga(o),  u  can  be  uniquely  determined  from  e, 
and  h  is  chosen  so  that  /«  is  absolutely  continuous. 


Rather  than  assuming  that  the  nominal  noise  density  and  ( 
are  known,  we  select  the  parametric  form  of  the  nominal  density 
and  numerically  determine  e  and  a  by  finding  a  minimum  of  the 
Fisher  information  of  the  least  favorable  density  while  satisfying 
the  moments  of  the  noise.  Our  results  use  the  variance  and  kur- 
tosis  of  the  noise  and  a  normalized,  truncated  Cauchy  nominal 
density  with  a  =  [01,02],  where  oi  is  the  scale  par2uneter  and 
02  the  truncation  point.  Detection  performance  is  examined  us¬ 
ing  Monte  Carlo  simulations  with  noise  distributions  that  differ 
from  the  Cauchy  contaminated  noise  model.  The  noise  realiza¬ 
tions  are  from  normalized  densities  truncated  at  ±02  with  either 
a  Gaussian-Gaussian  mixture  density,  a  Johnson-5i,  density,  or 
the  least  favorable  density  with  parameters  chosen  to  satisfy  the 
given  variance  and  kurtosis.  The  number  of  samples  n  compris¬ 
ing  the  test  statistic  at  the  output  of  the  nonlinear  correlator  is 
equal  to  1000.  The  input  signal-to-noise  ratio  (SNR)  is  -25  dB. 
The  output  SNR  is  the  simulation  performance  measure  since  it 
is  approximately  proportional  to  efficacy  for  a  weak  signal  [Ij. 

Performance  of  the  robust  detector  compares  favorably  with 
the  linear  detector,  sign  detector,  and  the  locally  optimal  detector 
of  the  simulation  noise.  For  very  heavy-tailed  noise,  the  robust  de¬ 
tector  performs  ais  well  as  the  sign  detector,  and  significantly  bet¬ 
ter  than  the  linear  detector.  For  moderately  heavy-tailed  noise, 
the  robust  detector  performs  better  than  the  sign  detector  and, 
in  some  cases,  better  than  the  linear  detector.  In  many  ceises, 
the  robust  detector’s  performance  approaches  that  of  the  locally 
optimal  detector  derived  with  knowledge  of  the  complete  noise 
statistics. 
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ABSTRACT 

Linear  processes  are  suitable  for  modeling  random  received 
waveforms  in  a  scattering  medium,  which  represents  radar, 
sonar  and  multipath  communication  channels.  We  address 
a  continuous-time  detection  problem  where  both  the  noise 
(hypothesis  Ho)  and  signal-plus-noise  (/fi)  waveforms  are 
modeled  as  linear  processes.  Uncertamty  in  the  nominal 
model  is  considered  in  the  form  of  classes  of  probability 
distributions  induced  on  a  function  space  by  the  processes 
under  the  two  hypotheses.  By  embMding  the  linear  pro¬ 
cesses  in  the  larger  class  of  infinitely  divisible  processes 
and  using  an  inten  d  representation  for  the  latter  class, 
we  identify  the  pair  of  distributions  that  are  least  fav<x- 
able  for  the  discrimination  of  linear  (irocesses;  an  optimal 
detector  designed  for  these  distributions  is  robust  for  the 
uncertainty  classes  considered. 

1.  PRELIMINARIES 
A  linear  process  y{t)  is  defined  by 

m  =  rf{t,s)dx{s),  t€i^[o,n  (1) 

Jo 

x(s)  being  an  independent  increment  process. 

The  characteristic  function  of  y(|)  =  (y(<i ),...,  y(tn))  is 

In  =  /  /  (e*""  -  iwv  —  1)  G(ds  x  dv) ,  (2) 

Jo  Jfl 

where  w  =  u*  /(l»,s)  and  G(  )  is  a  finite  measure 

given  by  Ex^(<)  =  f*  G{ds  x  dv) .  Defining  a  mea¬ 
sure  A#(A)  =  G((s,i;)  :  (v/(tt,s), . .  .,v/(t„,s))  €  A)  on 
R*,  0  =  {fi, . . (2)  can  be  written  as 

In  *,(if(u)  =  f  -  Hji.v,)  -  1)  A*(dvr) ,  (3) 

Jbp 

which  is  a  canonical  form  of  an  infinitely  divisible  random 
vector’s  characteristic  function.  Hence,  y{t)  is  an  infinitely 
divisible  process  [1]. 

A  projective  limit  measure  \{  )  on  IR^  can  be  constructed 
from  the  family  {A#(  )}  such  that  A{gfjA)  =  A»(A), 
where  y»/(  )  is  the  projection  mapping  from  IR^  to  IR*.  We 

assume  that  A(-)  is  restricted  to  X  =  Dj{I).  The  projec¬ 
tive  limit  measure  of  y(t)  is  A(fl)  =  G((s,v)  :  vf  {-,s)  € 
B),  Vfl  €  B{X),  the  Borel  seU  of  X  [1]. 

Maruyama(cf.  [2])  obtains  an  integral  representation  for 
infinitely  divisible  processes  by  considering  a  Poisson  ran¬ 
dom  measureTl{  )  on  B{X)  with  intensity  A(-);  For  disjoint 
Bj  €  B{X),  n(Rj)  are  independent  Poisson  random  vari¬ 
ables  with  Ell{Bj)  =  A(Bj)  and  n(-,u>)  is  a  measure  on 
B(X)  a.s.  We  then  have 

y{)  =  J^z[U(dz)-A{dz)].  (4) 

'Supported  by  ONR  Grant  Nnoni4-89-J-3152. 


Equation  (4)  is  symbolic  and  denotes  equality  in  distribu¬ 
tion  of  (y,  il>)  and  /„  (z,  V>)  [n(dz)  —  A(dz)],  €  X. 

Denote  by  Po  and  Pi  the  probability  measures  induced 
by  y{t)  on  X  under  Ho  and  Hi.  Using  (4),  we  find  that 
Pt  Po  and  that  under  Ho, 

^(,)  =  «P  {-K  +  /^  In  .  (5) 

where  K  =  Ai(A')  —  Ao(A').  Note  that  (5)  is  only  a  repre¬ 
sentation  formula  and  is  not  always  computable  in  terms  of 
y(t).  The  mapping  of  no(')  into  X  defined  by  the  integral 
in  (4)  is  not  one-to-one  and  hence  may  not  be  invertible. 

2.  ROBUST  DETECTION  OF  LINEAR 
PROCESSES 

We  address  the  minimax  robust  problem 

min  sup  R{Pi,d)  subject  to  sup  R(Po,^)  <  o,  (fi) 
*  Pie»>i  Pp6Po 

where  R{Pj,^),j  =  0,1  are  the  expected  risks.  The 
classes  Vj  correspond  to  the  e-contamination  or  total  wi- 
ation  neighborhoods  (C)  or  £)*)  of  nominal  measures 
Aj.  If  a  least  favorable  pair  of  distributions  Pj  satisfying 
R{Pj,^')  >  R{Pi,^')  'iPj  €  P,,  V  likelihood  ratio  tests 
between  Po  and  P(  exists,  (6)  is  solved  by  the  likeli¬ 
hood  ratio  test  between  fp  and  with  a  statistic  of  the 
form  (5). 

Consider  the  robust  discrimination  of  Poisson  random 
measures  on  X  with  classes  of  distributions  Pa,  gener¬ 
ated  by  intensity  measures  in  Cj.  Suppose  that  (Aig,  A^)  is 
the  least  favorable  pair  identified  by  Huber’s  theory  a^r 
normalizing  Cj  to  classes  of  probability  distributions.  It 
follows  that  the  Poisson  distributions  Pa^  <uid  Pa'  conn* 
sponding  to  An  and  A'l  are  least  favorable  [3]. 

The  robust  aetector  for  linear  processes  now  follows.  Us¬ 
ing  representation  (4)  of  linear  processes  in  terms  of  Pois¬ 
son  random  measures,  and  the  fact  the  likelihood  ratio  (5) 
is  identical  to  that  between  Poisson  random  measures,  we 
can  show  that  the  probability  measures  Pg  and  P{  corre¬ 
sponding  to  Ag  and  A'l  are  least  favorable  [2]. 
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Abstract 

This  paper  extends  previous  work  on  the  likelihood  detec¬ 
tion  of  cyclostationary  processes  in  stationary  Gaussian  noise.  In 
contrast  to  previous  developments,  we  use  a  Gaussiain  cyclosta¬ 
tionary  signal  assumption  rather  than  a  weak  signal  assumption. 
Under  the  assumption  of  completely  known  statistics  for  signal 
and  noise,  the  likelihood  ratio  detector  is  derived  for  two  related 
cases:  signal  detection  and  detection  of  cyclostationarity.  The 
difference  between  these  two  cases  involves  different  models  for 
the  stationary  statistics  under  the  two  hypotheses. 

Summary 

We  formulate  the  detection  problem  based  on  the  complex 
n«-element  observation  vector  W,  whose  elements  are  samples  of 
a  bandlimited  process  w(f)  with  the  first  sample  at  index  f  =  0, 
and  subsequent  samples  spaced  by  d.  The  noise,  Z,  is  a  zero- 
mean  complex,  stationary  Gaussian  random  vector  with  Toeplitz, 
Hermitian,  autocovariance  matrix  F.  The  signal,  X,  is  a  zero- 
mean  complex,  cyclostation  ary  Gaussian  random  vector  with  a 
finite  set  of  cycle  frequencies  that  are  harmonically  related  with 
fundamental  frequency  a.  (The  extension  to  polycyclostationar- 
ity,  with  incommensurate  fundamental  frequencies,  is  straightfor¬ 
ward.)  The  signal  autocovariance  matrix  is  C  =  Co  -I-  C.,  where 
Co  is  the  Toeplitz,  Hermitian  component  of  C  that  corresponds 
to  a  stationarized  version  of  X,  and  C,  =  ^"»>  where  Cm 

corresponds  to  the  mth  cycle  frequency.  Each  matrix  Cm  is  the 
Hadamard  product  of  a  Toeplitz  Hermitian  matrix  and  a  Hankel 
matrix  that  is  periodic  on  its  diagonals.  The  (p,  x)  element  of 
Cm  is  (Cm)(p,x)  =  Cm((p-xWexp{tm(V’-i-2)ra(p-  1)</)},  where 
Cm(r)  is  the  mth  Fourier  series  coefficient  of  the  autocovariance 
function,  at  lag  r,  of  the  continuous-time  signal  from  which  X  is 
derived,  and  ^  is  the  cyclic  phase;  the  phase  offset  of  the  fun¬ 
damental  cycle  frequency  relative  to  the  sampling  instant  for  the 
first  element  in  X. 

The  signal  detection  problem  involves  detecting  a  cyclosta¬ 
tionary  signal  added  to  stationary  noise  [1].  The  observation  vec¬ 
tor  under  the  null  hypothesis  vs  W  =  Z,  while  the  observation 
vector  under  the  alternative  hypothesis  is  fV  =  X  +  Z.  When 
the  signal  and  noise  statistics  are  completely  known,  the  likeli¬ 
hood  ratio  test  yields  a  sufficient  statistic  that  is  the  quadratic 
form  A(W)  =  '^LW  =  trace{LWTV*},  where  *  represents  the 
adjoint,  and  L  =  F”*C(F -f  C)  *.  The  test  involves  knowledge  of 
the  signal  power  and  the  noise  power,  as  well  as  the  cyclic  phase 
of  the  signal. 

When  the  signal  is  weak,  the  sufficient  statistic  reduces  to 
A(H0«Aw(VVw)4w<Ww-h  ^  trace{C^^CmCoWwHi} 


where  Ww  is  the  linear  time-invariant  filtering  of  the  observation 

vector;  W,  A  C^F-'W,  and  Cq  is  the  positive  definite  square 
root  of  Cq.  The  test  is  the  sum  of  an  energy  detector  and  the 
coherent  sum  of  detectors  for  each  cycle  frequency  [2]. 

When  the  signal  is  strong,  the  sufficient  statistic  reduces  to 
A(W)  w  W^FW.  The  linear  operator  depends  only  on  the  statis¬ 
tics  of  the  noise,  and  does  not  involve  cyclostationary  statistics  of 
the  signal. 

One  deviation  from  the  assumption  of  known  signal  statistics 
is  when  the  cyclic  phase  is  unknown  but  constant  over  the  observa¬ 
tion.  The  likelihood  ratio  test  statistic  A(H^  =  J  A(W|0)/*(^)dV’ 
where  A(W']V’)  >*  conditioned  on  a  fixed  cyclic  phase,  and  /*(^) 
is  the  probability  density  function  of  the  cyclic  phase.  When  the 
cyclic  phase  is  uniformly  distributed  over  (0,2t),  the  likelihood 
ratio  test  does  not  depend  on  the  Cm,  hut  only  on  Co. 

The  cyclostationary  detection  problem  involves  determining 
whether  an  observation  has  periodic  statistics  when  the  stationary 
statistics  are  the  same  under  the  two  hypotheses.  The  observation 
vector  under  the  null  hypothesis  is  W=  Y-¥  Z,  where  V  is  zero- 
mean,  Gaussian,  and  stationary  with  autocovariance  matrix  Co 
while  the  observation  vector  under  the  alternative  hypothesis  is 
W  =  X  +  Z  where  the  statistics  of  X  are  the  same  as  for  the 
signal  detection  problem.  When  the  signal  and  noise  statistics  are 
completely  known,  the  likelihood  ratio  test  statistic  is  a  quadratic 
form  with  operator  L  =  (F  -I-  Co)  *C,  (F  -I-  Co  -I-  C.)  \  When 
the  noise  is  much  stronger  than  the  signal,  the  linear  operator 
becomes  L  as  F~’C,F~*,  which  does  not  depend  on  the  stationary 
statistics  of  the  signal. 
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Reduced-stated  sequence  detection  is  a  method  of  reducing 
the  state  trellis  of  channel  code  to  a  smaller  structure.  We  show 
that  it  does  not  reduce  the  complexity  of  BSC  decoding  for  good 
convolutional  codes. 

The  reduced-state  sequence  detector,  or  RSSD,  has  been 
introduced  in  works  of  Eyuboglu,  Qureshi,  Chevillat,  Elephtheriou, 
Aulin  and  Larsson.  Because  there  exists  some  confusion  over 
precisely  how  an  RSSD  works,  we  will  briefly  review  the 
procedure  here.  First,  we  define  some  ground  rules  for  the  encoder 
design.  We  consider  only  decoders  that  never  backtrack,  as  does 
the  stack  algorithm  for  instance;  furthermore,  they  retain  a  fixed 
number  of  survivors  at  each  trellis  level,  as  do  the  Viterbi  and  M- 
algorithms;  finally,  the  decoding  is  bounded-distance  decoding, 
which  means  that  channel  error  correction  is  guaranteed  so  long  as 
the  error  sequence  is  of  some  size  d/2  or  less.  With  these 
assumptions,  the  RSSD  idea  works  as  follows.  Code  trellis  states 
at  level  n  are  grouped  into  classes,  defined  by  the  condition  that  no 
code  words  leading  into  states  in  a  class  are  closer  than  d  to  any 
other  code  words  leading  into  other  states  in  the  class.  When  the 
decoder  search  moves  on  to  the  next  trellis  level,  only  one  survivor 
leading  into  each  class  is  kept.  This  contrasts  with  the  usual  state 
trellis  decoding,  in  which  one  survivor  is  kept  into  each  state. 
RSSD  works  because  if  the  noisy  received  word  satisfies  the 
bounded-distance  condition,  the  transmitted  path  has  to  be  among 
the  survivors.  The  full  bounded-distance  potential  of  the  code  may 
be  obtained  if  the  d  parameter  is  set  equal  to  the  code  free  distance. 

RSSD’s  should  not  be  confused  with  reduced-search 
decoders,  which  search  only  a  small  part  of  the  original,  large 
trellis;  in  an  RSSD,  the  u-ellis  is  reduced  a  priori  and  all  of  it  is 
searched. 

We  give  a  new  algorithm  that  fomis  the  optimally  reduced 
trellis  for  a  convolutional  code  and  a  given  d,  and  we  show  what 
happens  when  the  algorithm  is  applied  to  good  codes.  The  class- 
forming  algorithm  depends  on  the  linear-code  symmetries  of 
convolutional  codes  and  falls  into  three  parts.  The  first  two  parts 
act  to  form  the  class  that  contains  trellis  state  0.  In  part  I,  trellis 
state  i  is  tested  to  see  whether  it  may  be  classified  with  state  0. 
The  procedure  reduces  to  a  dynamic  program  (i.e.,  the  Viterbi 
algorithm),  run  on  the  code  trellis  until  the  distance  into  each  state 
in  the  trellis  reaches  a  steady  state.  If  the  least-weight  path  into 
state  i  in  this  steady-state  condition  is  heavier  than  d,  then  state  i 
may  be  added  to  the  class  containing  state  0;  otherwise,  it  may  not. 
Part  II  tests  whether  a  state  that  may  be  classed  with  state  0  may 
be  combined  with  other  states  j,  k,  ...  that  already  have  been 
classed  with  state  0.  Part  111  forms  the  other  classes,  based  on  the 
class  that  contains  state  0. 

If  the  algorithm  just  described  cannot  find  any  state  that 
may  be  classed  with  state  0,  then  by  code  linearity,  no  states  in  the 
code  trellis  may  be  classed  with  any  other  states,  and  the  RSSD 
idea  fails  to  produce  a  smaller  trellis  structure  than  the  original 
state  trellis.  We  have  applied  the  algorithm  to  a  large  number  of 


good  codes  of  rates  1/3,  1/2,  2/3  and  3/4,  and  have  found  no  code 
that  admits  of  any  RSSD  trellis  reduction.  Only  when  the  class¬ 
finding  algorithm  is  applied  to  codes  with  much  poorer  free 
distance,  such  as  QLI  codes  and  feed-forward  systematic  codes,  do 
non-trivial  classes  get  formed.  Then  an  interesting  phenomenon 
occurs;  The  number  of  classes  so  formed  is  invariably  almost  the 
same  as  the  number  of  states  in  the  best-free-distance  code  with 
free  distance  equal  the  parameter  in  the  class  fomtation.  In  this 
way,  the  RSSD  idea  seems  to  convert  bad  codes  into  good  ones,  so 
far  as  the  trellis  size  needed  to  attain  a  d  is  concerned. 

In  conclusion,  RSSD  seems  to  point  to  some  interesting 
structural  properties  of  codes,  but  it  does  not  create  a  simpler 
decoder  for  codes  that  are  already  good.  It  is  also  of  interest  to 
compare  the  RSSD  to  the  M-algorithm,  which  obeys  the  same 
ground  rules.  It  is  easy  to  see  that  the  M-algorithm  is  the  optimal 
non-backtracking  decoder  that  keeps  a  fixed  number  of  survivors. 
A  more  subtle  proof  shows  that  RSSD  cannot  keep  fewer  survivors 
while  attaining  the  same  bounded-distance  d  parameter,  we  show 
this  in  [1],  Examples  can  be  given  that  show  that  the  M-algorithm 
actually  retains  many  fewer  survivors  for  the  same  d.  For  rate  1/2 
coding,  RSSD  must  retain  about  4'"’  survivors,  while  the  M- 
algorithm  needs  only  2.414“''^.  The  difference  is  much  more 
extreme  in  intersymbol  interference  problems.  These  facts  make 
sense  when  one  considers  that  RSSD  makes  its  trellis  reduction  a 
priori,  while  the  M-algorithm  and  other  reduced-search  decoders 
make  their  reductions  after  viewing  the  received  channel  sequence. 


(1]  J.B.  Anderson  and  E.  Offer,  "Reduced-state  sequence 
detection  with  convolutional  ctxles,"  in  submission,  IEEE 
Trans.  Information  Theory,  August  1992. 
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Mapping  the  Boundaries  Established  by 
State  Diagram  Connectivity 

Oliver  Collins 
Johns  Hopkins  University 

This  talk  will  analyze  how  the  intrasystem 
communication  between  the  different  parts  of  a 
partitioned  Viterbi  decoder  is  affected  by 
removing  the  restriction  that  states  be 
permanently  assigned  to  particular  modules.  It 
is  the  strains  on  intrasystem  communication 
that  limit  the  coding  gains  achievable  with 
VLSI  technology.  Some  of  the  techniques 
outlined  will  also  be  useful  for  improving 
concatenated  coding  systems  using  non- 
partitioned  decoders.  The  talk  will  require 
that  the  fundamental,  atomic  element  of  a 
decoder,  i.e.,  the  add  compare  select  unit, 
remain  unchanged  but  will  impose  no  restriction 
on  which  state  a  given  processor  handles  at  any 
time.  The  lower  bounds  on  intrasystem 
communication  will  result  from  following  the 
flow  of  imaginary  tokens  which  move  along  the 
same  paths  as  the  accumulated  metrics.  The 
tokens  will  flow  through  an  encoder  state 
diagram  which  has  been  unraveled  in  time.  Each 
column  from  left  to  right  represents  the  next 
decoded  bit  time  and  the  nodes  within  a  column 
are  the  different  possible  states  at  that  time. 

The  easiest  way  to  discover  what  this  new 
time-flow  graph  looks  like  is  to  examine  a 
logically  equivalent  but  structurally  different 
encoder,  i.e.,  given  the  same  sequence  of  input 
bits  this  new  encoder  always  produces  the  same 
sequence  of  output  bits  that  a  conventional 
convolutional  encoder  would;  however,  the 
internal  mechanics  are  very  different.  The 
state  of  this  encoder  will,  of  course,  depend 
on  the  most  recent  K  bits  of  the  data  stream, 
but  instead  of  shifting  all  of  the  bits  to  the 
right  to  make  room  for  a  new  incoming  bit,  the 
bits  which  make  up  the  state  are  replaced 
sequentially,  and  no  shifting  takes  place.  The 
new  encoder  is  time  varying;  it  contains  a 
pointer  which  starts  out  at  the  first  memory 
cell  and  then  moves  on  to  the  second,  third  and 
so  on.  When  it  reaches  the  last  cell  it  cycles 
back  to  the  first.  The  pointer  indicates  which 
bit  of  the  state  will  be  replaced  by  the  new 
information  bit.  A  suitably  rotating  set  of 
generator  polynomials  will  clearly  allow  this 


K-5  Equivalent  Encoder 


new  encoder  to  reproduce  the  same  sequence  of 
symbols  as  the  original.  Although, 
technically,  time  is  now  a  part  of  the  encoder 
state,  the  receiver  does  not  have  to  estimate 
this  variable. 

The  time  unraveled  state  diagram  of  this 
encoder  and,  of  course,  the  conventional  shift 
register  encoder  follows  directly  from  its 
picture  and  looks  like  an  endless  succession  of 
FFT' s  placed  one  after  the  other  in  a  line  with 
the  right  hand  circles  of  one  overlapping  the 
left  hand  circles  of  the  next.  If  a  suitable 
decoding  procedure  is  used  it  is  easy  to  show 
by  using  this  time  unraveled  state  diagram 
that  the  memory  of  an  outer  code  interleaver 
can  be  reduced  without  increasing  the 
probability  of  there  being  an  error  somewhere 
in  the  interleaving  block. 

Only  one  special  property  of  the  time 
unraveled  state  diagram  is  necessary  in  order 
to  analyze  the  information  flow  in  the  decoder, 
viz.  the  two  paths  which  leave  a  node  remain 
completely  separate  for  K-2  time  steps.  This 
observation  is  sufficient  to  show  the  following 
upper  bound  on  the  average  residence  time  of  a 
token  in  a  module  where  X  is  the  total  number 
of  state  pairs  which  the  module  can  hold: 

l+log(x)  + 

,  <  _ n  =  l^xHl _ 

^max  avg  -  j_20og(x)  -K+2) 

Surprisingly,  in  certain  special  cases 
this  very  coarse  bound  can  be  achieved  with 
equality  as  the  following  time  unraveled  state 
diagram  illustrating  the  split  of  a  K”5  decoder 
into  two  equal  parts  shows: 


Bits  in 
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ON  THE  MAXIMUM  DIFFERENCE  BETWEEN  PATH 
METRICS  IN  A  VITERBI  DECODER 

Andries  P.  Hekstra* 


Abstract 

The  number  of  bits  to  be  used  in  the  path  metric  calculus  of  Viterbi 
decoders  depends  logarithmically  on  the  maximum  possible  differ¬ 
ence  between  any  two  path  metrics.  Here,  a  recent  upper  bound 
of  .\lston  and  Chau  is  generalised.  In  addition,  we  obtain  an 
easy-to-compute  exact  expression  for  the  maximum  path  metric 
difference. 

Introduction 

Correct  operation  of  the  well-known  V'iterbi  algorithm  depends 
only  on  differences  of  path  metrics,  .As  shown  in  [1],  two’s  comple¬ 
ment  arithmetic  can  faithfully  represent  these  metric  differences, 
which  leads  to  an  efficient  implementation.  The  number  of  bits  to 
be  u.sed  in  this  calculus  depends  on  the  maximum  possible  differ¬ 
ence  max{Apm}  of  any  two  path  metrics  at  an  arbitrary  depth  L 
in  the  trellis.  Let  denote  the  maximum  symbol  metric.  As¬ 
suming  nonnegative  symbol  metrics,  an  elementary  upper  bound 
on  the  maximum  path  metric  difference  max{Apm}  is  [I,  2,  3] 

max{Apm}  <  Smainm  (1) 

where  m  denotes  the  memory  order  of  the  rate  R{=  k/n)  convo¬ 
lutional  code.  i.e.  the  maximum  shift  register  length  [3]. 

Recently.  .Alston  and  Chau  obtained  a  new  upper  bound  for 
decoders  of  binary  R  =  Ijn  codes,  under  certain  assumptions  on 
the  metric  function  [2], 

max{Apr77}  <  s„.„Ln(m -t- 1)  -  (2) 

The  assumptions  of  .Alston  and  Chau  can  be  simplified  to  the  fol¬ 
lowing. 

I  Symbol  metrics  are  nonnegative  integers. 

II  .A  branch  metric  is  the  sum  of  n  symbol  metrics. 

III  If  transmission  is  over  a  noiseless  channel  and  the 
transmitted  bit  differs  from  the  hypothesis  bit  the 
maximum  symbol  metric  Smnx  is  assigned, 
otherwise  the  symbol  metric  equals  zero. 

If  negative  branch  metrics  can  occur,  the  addition  of  a  constant 

to  all  branch  metrics  will  not  affect  the  operation  of  the  Viterbi 
decoder.  For  binary  codes,  it  can  be  shown  that  Assumption  III 
does  not  entail  a  loss  of  generality  either,  again  because  path  se¬ 
lection  is  determined  by  the  difference  of  symbol  metrics.  The 
difference  of  a  symbol  metric  given  that  a  zero  was  sent  and  the 
symbol  metric  given  that  a  one  was  sent  can  always  be  negative, 
irrespective  of  .Assumption  III. 

We  show  that  reception  of  the  all  zeroes  sequence  constitutes 
the  worst  ca.se  for  max {Apm},  for  any  depth  L  >  m.  As  a  result, 
we  obtain  an  easy-to-compute,  exact  expression  for  max{Apm} 
(Theorem  I)  and  a  generalisation  of  the  Alston  and  Chau  bound 
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(Theorem  II).  By  the  hard  decision  path  metric,  we  mean  the  path 
metric  if  the  (soft  decision)  Viterbi  decoder  were  of  the  hard  de¬ 
cision  type  and  the  same  signal  were  received.  Since  these  results 
were  obtained  obtained,  we  have  learned  that  an  exact  expression 
for  maximum  path  metric  differences  can  also  be  found  in  a  N'.ASA 
internal  report. 

Theorem  I 

Under  the  Assumptions  /-///,  for  any  (n.  k.m,  M,dfr„)  binary 
convolutional  code  and  for  an  arbitrary  received  sequence,  the  max¬ 
imum  path  metric  difference  max{ Apm}  equals  Smax  times  the 
maximum  hard  decision  path  metric  of  a  final  state  in  a  trellis 
L  branches  deep,  starting  from  the  all-zero  state  (for  arbitrary 
L  >  m). 

Let  M  denote  the  total  number  of  shift  register  cells  in  the 
encoder.  Throughout  the  paper,  a  realization  of  the  encoder  as  a 

parallel  combination  of  k  shift  registers  i  =  1.2 . it  for  which 

the  j-th  shift  register  is  m,  cells  long  [3]  is  cissumed.  The  memory 
order  m  is  defined  as  the  maximum  of  the  m,,  M  =  53,*=i 
Then,  A.M  is  defined  as 

AM  =  km  —  A/.  (3) 

Of  course,  for  direct  evaluation  of  Theorem  I  the  case  L  =  m 
is  the  easiest  to  evaluate.  However,  values  L  >  m  can  be  used  to 
obtain  an  analytical  upper  bound  to  max{Apm}. 

Theorem  II  (Generalized  Alston  and  Chau  bound) 

Under  the  Assumptions  l-Il.  for  any  (n.  k.m.  M.djrte)  binary 
convolutional  code,  the  maximum  path  metric  difference  is  upper 
bounded  by 

max{Apm}  <  Smaz  min{  [n(m  -I-  5)- 

d;x„(l-2-<^"+«>)J  I  6  =  0.1,...  }. 

(4) 
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ABSTRACT 

The  Viterbi  Algorithm  (VA)  is  the  optimum  decoding  algo¬ 
rithm  for  a  convolutional  code.  Improvements  in  the  performance 
of  a  concatenated  coding  system  that  uses  VA  decoding  (inner  de¬ 
coder)  can  be  obtained  when,  in  addition  to  the  stanard  output, 
an  indicator  of  the  reliability  of  the  VA  decision  is  delivered  to  the 
outer  stage  of  processing.  Two  different  approaches  of  extending 
the  VA  are  considered.  In  the  first  approach,  the  VA  is  extended 
with  a  Soft  Output  unit  (SOVA)  that  calculates,  based  on  the 
difference  between  the  cumulative  metrics  of  the  two  paths  merg¬ 
ing  at  each  time  instant  and  state,  reliability  values  for  each  of 
the  decoded  information  symbols.  In  the  second  approach,  coding 
gains  are  obtained  by  delivering,  in  addition  to  the  best  path,  the 
next  L  —  1  best  estimates  of  the  transmitted  data  sequence.  Here, 
the  output  format  is  a  list  of  size  L.  This  is  a  List  VA  (LVA). 
In  this  work,  we  evaluate  LVA  and  SOVA  in  comparison  to  each 
other  and  attsun  extended  versions  of  LVA  and  SOVA  with  low 
complexity  that  implement  the  other  algorithm.  We  construct 
and  evaluate  a  List-SOVA  using  the  reliability  information  of  the 
SOVA  to  generate  a  list  of  size  L  and  that  also  has  a  lower  com¬ 
plexity  than  the  LVA  for  a  long  list  size.  Further,  we  introduce  a 
low  complexity  algorithm  that  accepts  the  list  output  of  the  LVA 
and  calculates  for  each  of  the  decoded  information  bits  a  reliabil¬ 
ity  veilue.  The  complexity  and  the  performance  of  this  Soft-LVA 
is  a  function  of  the  list  size  L.  The  performances  of  Soft-LVA  and 
SOVA  are  compared  in  concatenate  coding  systems. 

SUMMARY 

The  Soft-Output  Viterbi  Algorithm  (SOVA)  [1]  and  the  Gen¬ 
eralized  Viterbi  Algorithm  (GVA)  or  List  output  VA  (LVA)  [2]  cire 
further  extended  and  compared  in  this  paper.  The  LVA  produces 
a  list  of  the  L  best  estimates  of  the  transmitted  data  sequence. 
The  SOVA,  however,  generates  sequentially  and  continuously  soft 
output  symbols,  where  the  amplitude  of  each  symbol  contains 
the  reliability  information  for  that  specific  symbol.  Two  differ¬ 
ent  units  are  developed,  both  using  the  reliability  information  to 
produce  a  list  of  size  L.  Assuming  an  ideal  outer  code,  the  per¬ 
formances  of  these  two  List-SOVAs  (SOVA  and  List  Generating 
Units)  are  compared  to  the  performance  of  the  LVA  for  the  Gaus¬ 
sian  and  independent  Rayleigh  fading  channels.  A  Soft  Symbol 
Output  Algorithm  is  defined,  using  the  differences  in  the  accumu¬ 
lated  metrics  of  the  best  path  and  the  L  best  paths  (1  <  1  <  L) 
of  the  LVA  as  a  measurement  for  the  reliability  of  each  decoded 
bit.  The  serial  LVA  (SLVA)  [2]  generates  this  list  iteratively.  A 
new  software  implementation  of  the  SLVA  is  presented.  The  new 
Soft-LVAs  and  the  SOVA  are  tested  in  a  concatenated  coding  sys¬ 
tem,  where  a  convolutional  code  is  chosen  as  the  outer  code.  The 
two  algorithms  of  interest  —  LVA  and  SOVA  —  evaluate  the  re¬ 
liability  of  the  VA  decisions  and  deliver  the  attuned  information 
to  an  outer  stage  of  processing.  In  case  of  erroneous  decisions  of 
the  VA,  we  observe  in  the  VA  outputs  correlated  information  bit 
errors  and  the  SOVA  outputs  correlated  symbol  reliability  values. 

Motivated  by  the  idea  of  finding  algorithms  with  lower  com¬ 
plexity,  which  produce  list  output  (long  list  size  L)  or  soft  output, 
we  extended  the  two  algorithms  by  additional  units  to  produce 
output  according  to  the  other  algorithm.  We  defined  a  low  com¬ 
plexity  list  generating  unit  that  accepts  the  deinterleaved  SOVA 


output  of  length  N  and  generates,  by  using  the  reliability  values 
as  “differences”  for  a  1-state  SLVA  (new  implementation)  decod¬ 
ing  process,  a  list  of  length  L.  LVA  and  List-SOVA  achieve  equal 
coding  gains  (for  the  Gaussian  channel)  of  about  1.0  dB  for  a  list 
size  L  =  2  and  1.5  dB  for  a  list  size  L  =  3  compared  to  the  VA 
performance.  Due  to  the  higher  slope  of  the  error  probability  of 
the  List-SOVA,  the  inferior  performance  for  low  SNRs  changes 
into  a  superior  performance  for  high  SNRs  when  compared  to 
the  LVA.  We  explain  these  results  due  to  the  use  of  interleaver  for 
the  List-SOVA.  We  attain  with  the  List-SOVA  an  adternative  List 
Output  Viterbi  Algorithm  that  achieves  due  to  costs  of  higher  de¬ 
coding  delays  (interleaving)  than  the  LVA  a  superior  performance 
for  high  SNRs.  For  a  short  list  size  the  complexity  of  the  List- 
SOVA  is  higher  than  the  complexity  of  the  LVA.  We  propose  in 
future  work  to  study  the  List-SOVA  performance  as  a  Wetion  of 
the  update  length  to  obtain  possibly  even  for  a  short  an 
acceptable  performance. 

We  introduce  a  low  complexity  Soft  Symbol  Output  Viterbi 
Algorithm,  based  on  the  LVA  (Soft-LVA)  that  uses  the  knowledge 
of  the  positions  where  the  L  best  path  differ  and  the  cumulative 
metrics  at  state  sn  of  the  L  best  paths,  to  produce  reliability 
information  for  each  of  the  decoded  information  bits.  Due  to  the 
fact  that  the  L  best  path  sequences  only  differ  at  a  limited  number 
of  information  bits  we  discovered  that  with  “soft”  initialization  of 
the  reliability  values  coding  gains  can  be  achieved  versus  a  scheme 
with  fixed  initialization  values  where  smaller  coding  gain  could  be 
obtained.  For  the  preliminary  “soft”  initialization  method  that  is 
based  on  a  SOVA  update  (obtained  from  the  SLVA)  with  update 
length  1,  we  achieve,  e.g  for  a  list  size  of  L  =  2  (low 

complexity)  in  compeuison  to  hard  output  decoding  a  coding  gain 
of  0.5  dB  at  10”^  for  codes  with  memory  3  for  the  inner  and  outer 
code  and  code  rates  1/2  (inner)  and  2/3  (outer).  With  list  size 
L  =  8,  1  dB  coding  gain  is  achieved.  We  assume  that  in  the 
first  8  estimates  of  the  sequence  the  significant  error  events  in  the 
VA  output  are  considered  for  block  lengths  N  =  32, 64, 128, 512. 
Compared  to  the  SOVA  the  proposed  Soft-LVA  has  0.2  dB  lower 
coding  gain  at  a  bit  error  probability  of  10“^,  N  =  L  = 
8.  Concerning  the  complexity  of  the  algorithms,  the  Soft-LVA. 
especially  when  the  SLVA  is  used,  has  a  lower  complexity  than 
the  SOVA,  [3]. 
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Recent  publication  by  Fomey  [1]  has  increased  an  interest  paid 
to  trellis  decoding  of  block  codes  and  different  combined  coding 
and  modulation  techniques.  Partial  unit  memory  (PUM)  codes 
introduced  in  [2]  have  advantages  of  both  block  and  convolutional 
codes.  Soft  maximum  likelihood  decoding  (SMLD)  based  on  trellis 
structure  can  be  obtained  for  these  codes,  however  such  a  decoder 
will  have  huge  number  of  states  and  branches.  The  problem  of 
reducing  complexity  of  such  decoders  has  been  investigated  in  [3], 
where  trellis  decoding  derived  was  based  on  technique  described  by 
J.Wolf  [4],  In  [5]  a  new  simple  algorithm  for  trellis  design,  using 
generator  matrix  of  array  codes  was  proposed.  This  algorithm 
allows  to  derive  the  trellis  diagram  for  any  array  code  with  reduced 
number  of  states  and  branches. 

In  this  paper  a  new  low  complexity  SMLD  for  PUM  codes, 
based  on  [5]  is  introduced.  It  is  shown  that,  the  new  technique  will 
provide  the  lowest  implementation  complexity  together  with  better 
distance  properties  in  comparison  with  conventional  techniques.  Let 
{X}={(x„x2,...x4)}  be  the  sequence  of  input  information  symbols  of 
length  k,  and  the  sequence  of  output  codewords  of  length  n  (n>k) 
{Y}={(y,,y2,...yJ)  is  defined  as; 

=  (1) 

where,  Gg  and  G,  are  kn  matrices.  If  rank(Go)=k  and 
rank(G,)=k,<k,  such  code  is  known  as  PUM  code  [2].  Matrices  Gg 
and  G,  are  defined  as  follows; 

Go=llG«,Go,ir  G,-1|0G„11  (2) 

whrer,  G,,  is  a  generator  matrix  of  (njtg)  block  code  (kg=k-k,);  Gg, 
and  G,,  are  (kn)  matrices  of  rank  k,;  lldl  is  kgn  all  zero  matrix  and 
II. 11^  is  the  transpose  of  matrix  ll.ll.  In  order  to  design  the  PUM  code 
of  block  code,  we  choose  the  Ggg  as  a  generator 
matrix  of  array  code,  but  matrices  Gg,  aitd  G„  must  satisfy  to 
following  conditions; 

(i)  the  distances  of  block  codes,  generated  by  matrices  G,  and  G, 
must  be  no  less  than  d.g/2  of  array  code; 

(ii)  matrix 

G.H|GooGo,G„||  (3) 


Example.  We  describe  an  array  code  (9,4,4)  with  generator  matrix; 

101000101 

011000011  101000000  000100001 

^’'oooioiioi  ^“'oiioooooo  ^“*001000100 

000011011 

Using  the  trellis  diagram  of  array  code  [5],  the  trelis  structure  of 
PUM  code  can  be  derived  easily.  This  code  has  the  following 
parameters;  n=9,  k=6  and  d^=4.  The  designed  trellis  diagram 
allows  to  implement  the  SMLD  of  PUM  code  with  mudi  lower 
complexity  comparing  with  conventional  techniques.  Table  1 
compares  the  complexity  of  trellis  diagrams  and  number  of 
operations  for  conventional  Viterbi  decoder  (VD),  SMLD  using 
J.Wolf’s  trellis  structure  of  block  codes  [3]  and  for  PUM  code, 
proposed  in  the  above  Example.  As  it  follows  from  Table  1,  the 
proposed  algorithm  allows  to  achieve  the  lowest  complexity. 


Table  1. 


Decoding 

Algorithm 

No  of 
Additions 

No  of 
Comparis. 

No  of 
Branches 

No  of 
Nodes 

VD 

2048 

240 

256 

8 

SMLD[3] 

296 

148 

296 

152 

Example 

172 

75 

100 

28 

In  addition,  the  technique  described  above  allows  to  increase  the 
distance  properties  of  PUM. 
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Summary 

The  evztluation  of  the  first  event  error  probability  P.  of  trellis  codes 
is  a  difficult  and  complex  problem.  The  best  knosm  approach  is 
the  truncated  union  bound  P.b  on  P.,  but  even  the  evaluation 
of  Pub  is  rather  complex  for  most  codes,  due  to  the  nonlinear 
or  nonregular  structure  of  typical  trellis  codes,  which  requires  a 
double  summation  over  all  sequences,  i.e.. 


where  x,  x'  are  the  sequences  of  the  code.  The  complexity  of 
the  search  algorithm  is  thus  proportional  to  N^,  where  N  is  the 
niiiubcr  of  states  in  the  code.  The  above  equation  is  often  written 
in  terms  of  the  distance  spectrum  as 

where  the  infinite  set  of  pairs  {<P,As}  is  the  distance  spectrum 
and  dfrau  is  the  minimum  squared  Euclidean  distance  of  the  code. 
We  thus  concentrate  on  evaluating  the  distance  spectra  of  these 
codes. 

A  lot  of  effort  has  gone  into  designing  regular  trellis  codes 
and  the  condition  for  regularity  (or  geometric  uniformity)  ue  well 
tindcrstood  now  [1].  The  reason  why  regular  codes  are  so  popu¬ 
lar  is  that  their  error  performance  can  be  evaluated  by  regarding 
a  single,  arbitrary  correct  sequence,  i.e.,  the  double  summation 
above  is  reduced  to  a  single  summation.  The  complexity  gain 
thus  achieved  it  significant  and  searching  a  trellis  with  If  states, 
where  N  it  the  number  of  code  states  is  considered  acceptable.  In 
this  sense,  calculating  the  error  performance  of  regular  codes  it 
e({uivalent  in  complexity  to  calculating  the  error  performance  of 
linear  codes. 

Various  researchers  |2,  3,  4,  5]  have  successfully  looked  at  re¬ 
ducing  the  complexity  of  calculating  the  union  bound  of  trellis 
codes,  using  the  linear  structure  of  the  underlying  generating  trel¬ 
lis.  Ail  these  methods  are  essentially  equivalent  [6],  in  the  sense 
that  the  linearity  of  the  code  generating  the  trellis  is  extended  to 
the  average  symbol  sequence. 

In  this  paper  we  look  at  this  problem  frore  ti.e  'x  rspective  of 
graph  theory  and  finite-state  machines.  Th';  '  problem  of 


calculating  the  union  bound  can  be  formulated  as  finding  all  possi¬ 
ble  paths  through  a  graph  with  states,  where  N  is  the  number 
of  code  states.  We  show  that  the  number  of  vertices  of  this  graph 
can  be  reduced  in  size  in  many  cases.  We  show  thsd  the  conditions 
for  quasi-regularity  in  [3]  (or  row  and  column  uniformity  in  |4|) 
lead  to  sets  of  equivalent  states,  reducing  the  size  of  the  grrq>h 
to  N  vertices.  This  approach  does  not  explicitly  use  the  linearity 
of  the  generating  code  generating  trellis  and  is  thus  applicable  to 
nonlinear  trellis  codes  such  as  some  of  the  rotationally  invariant 
codes. 

The  distance  spectrum  for  some  of  these  trellis  codes  can  ac¬ 
tually  be  calculated  using  a  graph  with  fewer  than  tf  states,  as  in 
the  case  of  the  4-state  8-PSK  Ungerboeck  code,  whose  associated 
Euclidean  distance  gnq>h  has  only  3  states.  We  expound  on  this 
and  look  at  how  the  distance  graph  can  be  used  to  obtsun  bounds 
for  other  distance  measures  which  can  be  computed  srith  small 
complexity. 

Ways  of  loosening  the  bound  which  leads  to  complexity  reduc¬ 
tions  and  the  tighness  of  the  bounds  will  also  be  addressed. 
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Abstract 

A  recanive  alsoritlim  ii  preaeated  for  aceoapliahiag 
maiimom  likelihood  aoft  ayadroaie  decodiag  of  trellis  coded 
modolatioB  codes.  It  consiats  of  sigaal-by-aigaal  hard  decod¬ 
iag.  followed  by  a  search  for  the  most  likely  error  pattera.  Aa 
error  trellis,  alteraatively  a  decodiag  table,  is  devised  for  des- 
cribiag  the  decodiag  procedare.  Methods  for  degeaeratiag  the 
error  trellis  eaable  ideatificatioa  of  the  sarviviag  error  path  by 
a  relatively  small  aumber  of  real  additioas.  la  compariaoa  with 
the  Viterbi  algorithm,  the  syadrome  decoder  achieves  aabataa- 
tial  redactioa  ia  the  compatatioaal  complexity,  especially  for 
moderately  aoiay  chaaaels. 

Smmnarjr 

Trellis  coded  modalatioa  (TCM)  codes  are  aaaally  geaerated 
by  employiag  aa  k/(k41)  rate  biaary  coavolotioaal  eacoder. 
The  k4l  coded  bits  select  oae  of  2*^'*'*  sobsets  of  a  redaadaat 
2^'**'*'l-ary  sigaal  set,  while  the  m  aacoded  bits  determiae 
which  of  the  2*  sigaals  of  this  sabset  it  to  be  traasmitted.  The 
sigaals  are  traasmitted  throagh  aa  additive  white  Oaasaiaa 
Boise  chaaael,  heace  mazimam  likelihood  (ML)  decodiag  it 
eqaivaleat  to  miaimam  sqaared  Baclideaa  distaaee  decodiag. 

The  ML  decodiag  of  TCM  codes  is  castomarily  accompliahed 
ia  two  steps:  a)  withia  each  sabset  of  sigaals,  the  aeareat 
aeighboar  to  the  received  sigaal  is  determiaed  by  a  procedare 
called  tabttt  dtcodtmg,  thea  b)  the  Viterbi  algorithm  is  applied 
for  fiadiag  the  sigaal  path  throagh  the  codeword  trellis  with 
the  miaimam  sqaared  Beclideaa  distaaee  from  the  seqaeace  of 
received  sigaals  [2].  We  replace  step  b)  by  aa  efficieat  ML  sya¬ 
drome  decodiag  algorithm,  saited  to  deal  with  aoabiaary  mod- 
alatioa  sigaals. 

Let  H  be  aa  iafiaite  dimeasioBal  parity  check  matrix  of  aa 
(k-M.k)  biaary  coavolatioaal  code  C  employed  for  a  TCM 
scheme.  Oivea  some  received  seqaeace  of  chaaael  oatpat  sig- 
aals.r  »(r|,r2,r3,---).  sigaal- by- sigaal  hard  deeitlom  yields 
▼  *‘(V|,V2.V3,-")-  By  hard  deettioH  we  mesa  fiadiag  the  clo¬ 
sest  c<^e  sigaal  to  the  received  sigaal  ia  terms  of  sqaared  Eac- 
lideaa  distaaee.  The  sabset  decodiag  is  also  accomplished  as 
part  of  the  hard  decisioa  procedare.  We  thea  expaad  each 
sigaal  is  T  iato  its  k  4l  coded  bits  represeatatioa  (i.e.,  discarde 
tbe  aacoded  bits).  Sabseqaeatly,  we  compete  the  correspoadiag 
sequence  of  syndrome  bits  X  ='(Z|,Z2.Z3. ■--)',  defined  by 
Z  »Ht'  (where  the  superscript  I  indicates  traaspositioa).  A 
measare  of  coafideace,  named  coitfidemct  value,  is  assigned  to 
each  of  the  2^'*''  sigaals, 

c,'  :  j  -0,l,2,"-,2‘+*-l  , 

belonging  to  the  reduced  sigaal  set  (i.e.,  the  sigaals  determiaed 

by  the  sabset  decodiag).  The  confidence  value,  denoted  p|^,  ia 
defined  by 

-J*(Ci^  .ti)  -  il^fvi.ri), 

where  d  (-,-)  steads  for  sqaared  Baclideaa  distance.  The  col¬ 
amas  of  H  are  partitioned  iato  sets  of  k-M  eoasecative  colamas 

This  research  was  sapported  ia  part  by  the  Basie  Research 
Poandatioa  administered  by  the  Israel  Academy  of  Sciences 
sad  Hamaaities. 


each,  called  hands,  thea  is  also  regarded  as  the  coafideace 
value  of  theyth  Mukkaud  (i.e.,  sabset)  of  the  Ith  band  of  El. 
The  weight  of  a  coUectioa  of  aabbaads  beloagiag  to  different 
(aot  aeceasarily  eoasecative)  hands,  is  defiaed  to  be  the  aam  of 
the  confidence  valaes  of  the  elements  of  the  coUectioa. 

The  decodiag  procedure  starts  with  the  compatstirm  of  z.  If 
Z  »0  then  T  ia  the  moat  likely  seqaeace  of  coded  bits.  Other¬ 
wise.  ML  decoding  is  achieved  by  fiadiag  the  isast  weighing 
error  collection  ef  tuhhaudt  that  turn  up  to  X  and  thea  comple- 
meating  the  bits  of  w  at  the  positions  correspoadiag  to  the  col- 
oamas  iacladed  ia  this  coUectioa.  Aa  error  treUis  is  devised 
for  compactly  deacribiag  aU  the  possible  error  coUectioas.  The 
treUis  is  degeaerated  according  to  the  compositioa  of  the  sya¬ 
drome  sequence. 

The  foUowiag  Table  exhibits  a  eompatiaoa  betweea  compata- 
tioaal  complexities  of  the  Viterbi  algorithm  and  the  propoeed 
syndrome  decodiag  algorithm,  appUed  to  a  aim  pie  four  sUto 
trelUs  code  for  l-PSK  modalatioa.  The  two  algorithms  wiU 
decode  to  the  same  code  seqaeace  and  thus  give  ideatical  per¬ 
formance.  However,  the  Viterbi  decoder's  compatatioaal  co 
mplexity  is  iadepeadeat  of  the  chaaael  sigaal  to  aoise  ratio 
(SNR)  while  the  syndrome  decoder's  average  compatalioaal 
complexity  decreases  aa  the  SNR  increases  (in  similarity  arith 
the  case  of  syadrome  decodiag  of  biaary  coavolatioaal  codes 
PD.  The  complexity  is  measared  ia  terms  of  the  total  aambers 
of  real  additioas  sad  compsrisoas,  reqaired  for  decodiag 
300- bit  traacated  seqaeaces.  The  average  complexities  were 
obtained  by  aimalatiaaa.  The  worst  case  complexities  are  fixed 
and  iadepeadeat  of  the  SNR. 


SNR 
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In  Full  State  (FS)  Viterbi  decoding,  N  =  N,  x  AVa  is  the  number 
of  states  of  the  FS  trellis  and  of  a  siqyer-encoder  which  includes  the 
convolutional  encoder  and  the  channel  number  of  states,  given  by  N, 
and  Nch  respectively.  In  Reduced  State  (RS)  schemes  [1],  reduction  of 
the  channel  stales  to  iV'^  (1  <  <  Nch),  yields  the  size  l/’J  of  the 

FS  super-encoder  mapper  also  diminished,  with  a  .V'-state  RS  super¬ 
encoder  with  mapper’s  size  Thus,  per  each  state  (or  symbol)in 
the  RS  trellis  we  have  p  =  NchlNl,^  =  unknown  states  (or 

symbols),  where  the  symbols  are  estimated  by  the  Viterbi  decoder  and 
then  fed  back  to  equalize  the  unknown  ones  in  the  next  decoding  step. 

If  among  the  p  potential  survivors,  an  inconect  sequence  is  selected, 
the  error  propagation  (EP)  effect  occurs.  The  fe^back  mechanism 
prevents  the  attainment  of  error  probability  bounds  from  the  RS  state 
transition  diagram.  To  overcome  this  difficulty,  the  N’  reduced  stales 
can  be  combined  with  the  p  unknowns,  to  obtain  the  V-state  FS 
trellis  {N'p  =  N),  with  labels  including  symbol  estimations.  Since 
in  RS  decoding  at  least  two  enor  events  are  involved,  we  consider 
the  occurrence  of  multiple  error  events  in  the  FS  trellis  resultant  from 

splitting  of  the  RS  states,  where  all  the  states  so<  . -'p-i  are  seen 

the  same  by  the  decoder.  Its  upper  bound  is  obtained  as 

<  S  £  L  E  PASKxijlPK  lsiiplSK.L  -  -  Si] 

A=lt=IJk-,t  i=0  ;=l 

(1) 

where  the  correct  sequence  is  denoted  by  S^.l  and  its  occurrence 
probability  as  P,(S/,  ,i].  K  stands  for  error  event  order  and  I.  represents 
its  length.  P/,  [s-]  is  the  A'-th  error  event  starting  state  Si  probability, 
and  P[Skx  -»  S'i;  Jsi  -*  Sj]  is  the  pairwise  error  probability  for 
the  inconect  path  5'^,  between  Si  and  sj.  From  the  first  error  event 
definition,  the  term  (A'  =  1)  in  Eq.  (I)  carries  no  EP  effect  By 
assuming  that  the  same  starts  firom  the  transmitted  state  sq.  Pi  [sq]  =  1 

and  Pi[si]  =  0(i  =  1,2 . p-  1).  For  A'  >  2,  the  error  events  may 

start  from  any  s,  but  from  sq,  since  at  that  time  at  least  one  error  event 
have  already  happened  (i.e.  Pfc  [aol  =  0  for  A'  >  2).  These  terms 
will  contain  the  EP  effect.  Moreover,  all  error  events  will  end  at  any 
Si  ((  =  1,2,.. .,p  -  1),  where  sq  is  excepted  since  at  the  time  of  RS 
decision,  the  transmitted  and  decoded  states  are  not  fully  coincident. 

Now,  we  calculate  Eq.  (I)  by  two  methods.  The  first  considers 
the  error  weight  matrix  transfer  function  [2].  We  define  two  transfer 
function  matrices :  Ta(: )  which  represents  the  transfer  function  of  the 

first  error  event  starting  from  sq  and  ending  at  sy  0  =  I  <  2 . p-  1 ), 

and  Tb{z)  that  corresponds  to  the  error  events  starting  from  s,  and 
ending  at  sy  (i,  j  =  l,2,...,p  -  1).  :  is  a  parameter  resulting  firom 
the  Chemoff  bound.  An  element  (ij )  of  the  matrices  is  an  error  event 
probability  upper  bound  associated  to  the  paths  between  Si  and  Sj. 
Then,  we  define  two  other  transfer  function  matrices  T^(:: )  and  ) 
with  elements  ( r ,  J )  containing  the  pairwise  error  (rather  than  the  error 
event)  probability  upper  bound.  The  total  transfer  function  matrix 
becomes  then,  T{z)  =  T,(i)  +  T;(;)(I  -  r(,(j)]-'Tk(--)  where  all 
the  factor  multiplying  gives  the  starting  state  probabilities  for 
(A'  >  2)  and  I  is  a  A  x  V  identity  matrix.  To  obtiun  the  bit  error 
probability  upper  bound,  we  extend  T„(:)  and  to  include  the 
number  r  of  incorrect  input  bits  resulting  Ta(c,  I)  and  Tt(c,  I),  where 
/  is  an  indeterminate  whose  exponent  is  r.  The  total  extended  transfer 
function  results  then  T(c. /) -^  T.{r,/)-fT;(--)[l-Ti(c)]-'T4(;,/) 
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and  the  upper  bound  of  the  bit  error  probability  is  obtained  as  in  [2]. 
In  the  calculation  of  T( : .  /),  the  Chernoff  bound  of  the  pairwise  error 
probability  is  taken,  and  the  elements  of  all  the  defined  matrices  are 
calculated.  We  call  this  the  Union  Bound  (UB). 

The  second  method  consists  in  the  estimation  by  Gaussian  (Quadra¬ 
ture  Rules  (GQR),  of  the  CDF  associated  to  the  pairwise  error  proba- 
bility  of  Eq.  (1).  This  is  based  on  the  calculation  of  a  set  of  moments 
of  the  r.v.  related  to  the  branch  metrics.  Since  the  path  metric  is 
constructed  by  succesive  bratKh  metric  additions,  the  moments  of  the 
correspondent  r.v.  are  obtained  firom  succesive  binomial  expansions 
in  the  form  of  moments  convolution.  When  an  error  event  is  defined, 
such  moments  are  used  to  estimate  the  CDF,  from  where  the  pairwise 
error  probability  is  computed.  In  this  case,  we  use  an  algorithm  and 
include  all  the  error  events  by  traversing  in  a  general  iV^-state  trellis 
diagram  based  on  pairwise  stales  [3].  The  bound  thus  obtained  is  called 
Moments  Bound  (MB). 

Although  the  pairwise  error  probability  of  Eq.  (1),  based  on  the  path 
metric  condition,  holds  equally  for  trellises  with  or  without  parallel 
transitions,  in  the  first  case,  an  additional  condition  is  satisfied  since 
a  decoded  branch  symbol  is,  among  the  members  of  the  paiallei 
transition,  the  nearest  in  Euclidean  distance  to  the  received  signal. 
By  union  bounding  all  the  sequences  arising  from  parallel  transitions 
concatenation,  an  upper  bound  is  obtained.  An  element  taken  from  it 
will  have  a  branch  metric  that  is  calculated  from  a  truncated  Gaussian 
noise  pdf,  with  limits  depending  on  how  the  decision  space  is  divided 
by  the  parallel  transition  symbols.  The  tighter  bound  like  this  obtained 
is  called  Elementary  Bound  (EB). 

The  results  shown  below  correspond  to  a  16-QAM  TCM  scheme, 
with  N,  =  4,  Art  =  8,  =  1.  The  tight  upper  bound  (TUB)  [2] 

shown  can  be  applied  only  for  complete  Gaussian  noise  pdf  or  UB. 


Figure  1:  RS  16-QAM  TCM  performance  bounds 
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Abstract 

In  this  paper  we  present  a  novel  maximum-likelihood  soft-decision 
decoding  algorithm  for  linear  block  codes.  The  approach  used  here 
is  to  convert  the  decoding  problem  into  a  search  problem  throogh 
a  graph  which  is  a  trellis  for  an  equivalent  code  of  the  transmit¬ 
ted  code.  Algorithm  A*  is  employed  to  search  through  this  graph. 
This  search  is  guided  by  an  evaluation  function  /  defined  to  take 
advantage  of  the  information  provided  by  the  received  vector  and 
the  inherent  properties  of  the  transmitted  code.  This  function  /  is 
used  to  drastically  reduce  the  search  space  and  to  make  the  decoding 
efforts  of  this  decoding  algorithm  adaptable  to  the  noise  level.  Sim¬ 
ulation  results  for  the  (104,  52)  binary  extended  quadratic  residue 
code  and  the  (128,64)  binary  extended  BCH  code  are  ^ven. 

Summary 

The  use  of  block  codes  is  a  weU-known  error-control  technique  for 
reliable  transmission  of  digital  information  over  noisy  communica¬ 
tion  channels.  Linear  block  codes  with  good  coding  gains  have  been 
known  for  many  years;  however,  these  block  codes  have  not  been  used 
in  practice  for  lack  of  an  efficient  soft-decision  decoding  algorithm. 

In  this  paper  we  present  a  novel  maximum-likelihood  soft-decision 
decoding  algorithm  for  linear  block  codes.  This  algorithm  uses  Al¬ 
gorithm  A*  [3],  which  is  a  generalization  of  Dijkstra’s  algorithm  [2] 
to  search  through  the  trellis  for  a  code  equivalent  to  the  transmitted 
code.  This  search  is  guided  by  an  evaluation  function  /  defined  for 
every  node  m  in  the  trellis,  /(m)  =  y(m)  -h  h(m),  where  y(m)  is  an 
estimate  of  the  cost  of  the  minimum  cost  path  from  the  all-zero  node 
at  depth  0  to  node  m,  and  where  h(m)  is  an  estimate  of  the  cost  of 
the  minimum  cost  path  from  node  m  to  the  all-zero  node  at  depth 
n.  The  function  /  is  defined  to  take  advantage  of  the  information 
provided  by  the  received  vector  and  the  inherent  properties  of  the 
transmitted  code.  The  use  of  this  priority-first  search  strategy  for  de¬ 
coding  drastically  reduces  the  search  space  and  results  in  an  efficient 
optimal  soft-decision  decoding  algorithm  for  linear  block  codes.  Fur¬ 
thermore,  in  contrast  with  Wolf’s  algorithm  [4],  the  decoding  efforts 
of  our  decoding  algorithm  a'e  adaptable  to  the  noise  level. 

The  proposed  algorithio  is  applicable  to  any  linear  block  code 
and  does  not  require  the  availability  of  a  hard-decision  decoder.  Fur¬ 
thermore,  any  stopping  criterion  ensuring  that  a  solution  has  been 
found  can  be  easily  incorporated  into  this  algorithm. 

Simulation  results  for  the  (104,52)  binary  extended  quadratic 
residue  code  and  the  (128, 64)  binary  extended  BCH  code  when  these 
codes  are  transmitted  over  the  AWGN  channel  are  given  in  tables  1 
and  2,  respectively.  These  results  were  obtained  by  simulating  35,000 
samples  for  each  SNR. 


Table  1;  Simulation  for  the  (104,52)  code 
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Table  2:  Simulation  for  the  (128,64)  code 
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where 


N{t)  =  the  number  of  nodes  visited  during  the  decoding  of  r; 

C(r)  =  number  of  codewords  constructed  in  order  to  decide  on  the 

closest  codeword  to  r; 

Af (r)  =  maximum  number  of  nodes  stored  during  the  decoding  of 

r; 

max  =  maximum  value  among  35,000  samples; 

ave  =  average  value  among  35,000  samples; 

76  =  Ei/No- 

Simulation  results  for  the  above  linear  block  codes  attest  to  the 
fact  that  this  decoding  technique  drastically  reduced  the  search  space, 
especially  for  the  majority  of  practical  communication  systems  where 
the  probability  of  error  is  less  than  10"^  (7S  greater  than  6.8  dB)  [1]. 
For  example,  the  results  of  Table  2  at  6  dB  indicates  that  for  the 
35,000  samples  tried,  this  decoding  algorithm  is  approodmately  15 
orders  of  magnitude  more  efficient,  in  time  and  space,  than  Wolf’s. 
Thus,  this  decoding  procedure  has  not  only  resulted  in  an  efficient 
soft-decision  decoding  algorithm  for  hitherto  intractable  linear  block 
codes,  but  an  algorithm  which  is  in  fact  optimal  as  well. 
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Abstract 

The  perfimnance  evaluation  of  a  priori  probability  (APP) 
decodable  codes  is  consideied.  In  particular,  we  are  interested  in 
a  decoding  method,  which  uses  an  approximation  [2]  to  the 
weight  functions  involved  in  the  decoding  process.  The  channel 
is  assumed  to  be  additive  white  Gaassian,  and  the  modulation 
format  is  as.sumed  to  be  binary  phase  shift  keying  (BPSK).  Two 
cases  are  coasidered:  in  the  first  case,  analog  (unquantized) 
demodulator  output  samples  are  used  in  soft  deci.sion 
approximate  APP  (AAPP)  decoding.  In  the  .second  case,  we 
assume  that  the  demodulator  output  .samples  are  quantized  u.sing 
an  analog-to-digital  (A/D)  converter,  and  the.se  quantized 
samples  are  utilized  in  the  AAPP  decoding  process.  In  both  the 
cases,  expre.s.sion.s  for  the  probability  of  fir.st  decoding  error  are 
derived  u.sing  characteristic  functions.  We  compute  the 
probability  of  first  decoding  error  numerically  for  both  block  and 
convolutional  APP  decodable  codes  u.sing  the  analytical 
expres.sion.s  derived,  and  the.se  re.sults  .show  good  agreement  with 
.simulation  results.  Finally  some  interesting  aspects  of  targe  block 
length  threshold  decodable  codes  are  discu.s.sed  [5]. 

Summary 

APP  decoding  was  introduced  by  Mas.sey  ( 1 ).  It  provides 
a  .soft  deci.sion  decoding  algorithm  that  minimizes  .symbol  error 
rate  for  thre.shold  decodable  codes  [I,  3].  In  APP  .soft  decision 
decoding,  a  set  of  orthogonal  parities  are  computed  from  the 
hard  deci.sions,  and  each  of  these  parities  are  assigned  "weights" 
based  upon  the  channel  reliability  information.  In  exact  APP 
decoding,  the.se  weights  are  complex  non-linear  functions  of  the 
channel  reliabilities  of  the  components  of  the  parity  equation.s. 
Hence  in  reality,  exact  APP  decoding  is  rarely  implemented. 

In  [2],  an  approximation  to  the  weight  function  which 
depends  only  on  the  least  reliable  component  of  the  parity  check 
equation  was  .sugge.sted,  and  .shown  via  simulations  to  perform 
extremely  well  (degradation  relative  to  exact  APP  decoding  is 
le.s.s  than  a  tenth  of  a  dB)  even  at  low  Eb/No  for  a  large  set  of 
thre.shold  decodable  codes.  This  approximation  Is  widely  u.sed  in 
practical  realizations  of  soft  decision  threshold  decoders  [4]. 

In  our  work,  we  derive  analytical  expressions  for  the 
probability  of  first  decoding  error  for  AAPP  soft  decision 
decoding.  In  our  analysis,  we  as.sume  that  already  decoded 
symbols  and  their  reliabilities  are  not  fed  back  in  the  decoding  (»f 
.sub.sequent  symbols.  We  al.so  as.sume  for  the  purpo.se  of  analy.sis, 
a  "Type-II"  decoder  [1,3]  where  any  orthogonal  parity 

B  =  r,+  rj-f- . +  r,  (D 

is  an  e.stimale  of  the  orthogonal  .symbol  C.  In  ( I ),  r|  reprc.scnt  the 
hard  decisions  on  the  demodulated  matched  filter  .samples  y,,  and 
the  .symbol  ®  repre.sents  modulr>-2  addition.  In  AAPP 
decoding,  the  weight  of  the  parity  B,  denoted  w(B),  is  given  by 


w(B)  =  min{ly|l,  ly^l . ly.l}.  (2) 

We  proceed  to  evaluate  the  conditional  characteristic  function  of 
the  random  variable  w(B)(2B-l).  conditioned  upon  the 
orthogonal  .symbol  C.  We  then  make  u.se  of  the  fact  that  given 
the  orthogonal  symbol,  the  parity  check  equations  are 
independent,  and  derive  conditional  characteri.stic  function  of  the 
decision  variable  in  Type-Il  AAPP  decoding.  An  efficient 
technique  to  numerically  evaluate  the  probability  of  first 
decoding  error  from  the  characteri.stic  function  of  the  deci.sion 
variable  is  also  de.scribed. 

We  evaluate  the  probability  of  first  decoding  error  for 
thre.shold  decodable  block  codes  asing  the  analy.sis  de.scribed 
above.  We  compare  the  numerical  results  to  simulation  reruilts, 
and  .show  that  they  are  in  good  agreement  It  is  shown  that  when 
the  demodulator  output  .samples  are  quantized  to  K-levels  and 
AAPP  decoding  is  u.sed,  the  degradation  relative  to  unquandzed 
ca.se  is  about  0.2S  dB  only  when  the  length  of  the  threshold 
decodable  code  is  .small.  When  the  length  of  the  threshold 
decodable  code  is  large,  the  degradation  is  close  to  1.0  dB.  A 
new  decoding  algorithm  [5]  for  the  quantized  .sample  case  is 
proposed,  which  eliminates  the  above  mentioned  drawback  for 
threshold  decodable  block  codes  with  large  block  length. 

Discu.s.sions  with  my  collegue  Y.  Hebron  are  greatly 
appreciated  during  the  course  of  this  work. 
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Abstract.  The  problem  of  efficient  maximum-likelihood  soft  decision  de¬ 
coding  of  binary  BCH  codes  is  considered.  It  is  known  that  those  primitive  BCH 
codes  whose  designed  distance  is  one  less  than  a  power  of  two,  contain  subcodes 
of  high  dimension  which  consist  of  a  direct  sum  of  several  identical  codes.  We 
show  that  the  same  kind  of  direct-sum  structure  exists  in  oil  the  primitive  BCH 
codes,  as  well  as  in  the  BCH  codes  of  composite  block  length.  We  also  introduce  a 
related  structure  termed  the  "concurring-sum" ,  and  then  establish  its  existence  in 
the  primitive  binapr  BCH  codes.  Both  structures  are  employed  to  upper  bound  the 
number  of  states  in  the  pr^r  minimal  trellis  of  BCH  codes,  and  develop  efficient 
algorithms  for  maximum-likelihood  soft  decision  decoding  <rf  these  codes. 

In  [2]  Forney  has  shown  that  the  binary  Reed-Muller  codes  contain  direct- 
sum  subcodes  of  high  dimension.  It  is  well  known  that  certain  BCH  codes, 
namely  the  primitive  binary  BCH  codes  with  designed  distance  one  leas  than 
a  power  of  two,  are  supercodes  of  punctured  Reed-Muller  codes.  Hence  these 
BCH  codes  evidently  share  the  direct-sum  structure  of  the  RM  codes.  This 
fact  was  used  by  Kasami  et  al.  [3]  to  construct  efficient  trellis  diagrams  for 
the  (64,24,16)  and  (64,45,8)  extended  BCH  codes,  and  also  several  double¬ 
error  correcting  BCH  codes.  The  following  question,  hence,  arises:  do  other 
BCH  cod,  also  contain  direct-sum  subcodes  of  high  dimension?  We  settle 
this  question  affirmatively  for  alt  the  primitive  BCH  codes,  and  also  for  the 
BCH  codes  of  composite  block  length.  The  direct-sum  structure  is  in  a  sense 
a  counterpart  of  the  concept  of  “zero-concurring”  codewords  of  [1,  4],  ob¬ 
tained  by  substituting  a  code  for  each  codeword.  We  also  study  a  different 
structure,  where  we  allow  the  constituent  codes  to  overlap  over  a  fixed  set  of 
coordinates.  This  concurring-mm  structure  is  the  corresponding  counterpart 
of  the  “concurring"  codewords  of  [1].  We  show  the  existence  of  concurring-sum 
structures  in  all  the  primitive  binary  BCH  codes.  Both  the  direct-  and  the 
concurring-sum  structures  make  it  possible  to  set  nontrivial  upper  bounds  on 
the  number  of  states  in  the  minimal  proper  trellis  of  BCH  codes,  and  provide 
a  clue  for  efficient  soft-decision  decoding. 

Let  C  be  a  binary  BCH  code  of  length  n  and  dimension  k,  let  a  be  a 
primitive  n'*  root  of  unity,  and  let  Z  be  a  subset  of  {0, 1, ...  n  - 1}.  Denote 
by  C(Z]  the  subcode  of  C  which  consists  of  all  those  codewords  that  are 
nonzero  only  on  the  positions  contained  in  I.  Let  C(T)  be  the  code  obtained 
from  C[2]  by  puncturing  out  all  the  positions  not  in  I. 

Propoeition  1.  Let  Ii  and  Xi  be  subsets  of  the  set  {0, 1, . . .  n-1},  such  that 
/or  some  o  €  {0, 1, ..  .71-1}  we  have  {o' :  i  €  I3}  =  {q“ -a' :  i  6  Zi}.  Then 
C(Z,)  =  C(Z,). 

Now  assume  that  the  block  length  of  C  is  composite,  say  n  =  nj  nj,  and  let  Z 
be  the  set  of  zero  frequencies  of  C.  Define  5  =  {s  =  z  (mod  ni)  :  z  e  Z]. 

Proposition  2.  LetZi  =  {0,n],2n], . . .  (711-1)713}.  Then  the  code  C(T\)  is 
a  BCH  code  of  length  tii  and  dimension  k\  =ni-  |S|.  The  zeros  ofC{Xi)  He 
at  {0‘  :  s€  S],  where  0  =  a"*  is  a  primitive  7i{*  root  of  unity. 

In  order  to  obtain  direct-sum  subcodes  of  high  dimension  in  BCH  codes  of 
composite  block  length,  it  would  now  suffice  to  partition  the  set  {0, 1, . . .  ti— 1} 
into  713  disjoint  subsets  satisfying  the  condition  of  Proposition  1  with  respect 
to  the  set  Zi  defined  in  Proposition  2.  Note  that  the  sets  Z  and  S  are  unions 
of  cyclotomic  cosets  modulo  n  and  ni ,  respectively.  Thus  the  definition  of  5 
in  conjunction  with  Proposition  2  induces  “coset  aliasing”  between  the  cyclo¬ 
tomic  cosets  modulo  7t  and  modulo  ni.  In  particular,  certain  high  frequencies 
of  C  alias  as  low  Requencies  in  C{Xi).  This  is  intuitively  plausible  since  Zi  is 
just  the  “time-domain  sampling”  of  C. 

In  the  sequel  we  consider  the  primitive  BCH  codes.  Henceforth  let  C 
denote  an  extended  primitive  narrow-sense  BCH  code  of  length  n  -l- 1  =  2™. 

Proposition  3.  Let  X\  and  Z3  be  subsets  of  the  set  {0, 1, ...  n  - 1, 00},  such 
that  for  some  o  €  {0, 1, . . .  ti-  1}  we  have  {o‘ : «  €  Z3}  =  {a”  -I-  a*  :  •  €  Zi}. 
Then  C(Zi)  =  C(Z3). 

Propositions  may  be  thought  of  as  the  “addition  counterpart”  of  Proposi¬ 
tion  1.  Thus  we  can  exhibit  the  existence  of  direct-sum  subcodes  in  the  ex¬ 
tended  primitive  BCH  codes  by  partitioning  the  set  {a”, a*,...  a”~',a°‘’} 
into  disjoint  subsets  satisfying  the  condition  of  Propositions  with  respect  to 
some  given  subset.  Yet  this  set  is  just  the  field  GF{2"').  Thus  it  would  suffice 
to  regard  GF(2"')  as  a  vector  space,  and  partition  it  into  a  subspace  and 
its  coeets.  Notably,  Propositions  may  be  also  employed  for  the  derivation  of 
the  concurring-sum  structure  in  the  primitive  binary  BCH  codes.  For  more 
details  on  this  see  [51. 
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We  now  indicate  how  the  direct-sum  and  the  concurring-sum  structures 
may  be  employed  for  efficient  maximum-likelihood  soft  decision  decoding. 
Let  s  be  the  logarithm  of  the  maximum  number  of  states  in  the  minimal 
proper  trellis  of  a  linear  code  C.  This  parameter  governs  the  complexity  of 
maximum-likelihood  decoding  of  C  using  the  trellis  diagrams  of  [2].  It  follows 
from  the  trellis  construction  of  Wolf[6],  that  s  <  min{K,N  -  A},  where  tf 
and  K  are  the  block  length  and  the  dimension  of  C.  We  employ  the  direct- 
sum  and  the  concurring-sum  structures  of  C  to  substantially  improve  upon 
this  upper  bound.  Assume  that  C  contains  a  subcode  which  is  a  direct-sum 
of  h  identical  codes,  each  of  dimension  k.  Then  by  arranging  the  coordinates 
of  C  in  alignment  with  its  direct-sum  structure,  it  follows  that  s  <  K-{h-l)k. 
Substituting  the  parameters  of  the  direct-sum  structures,  that  we  were  able 
to  find  using  the  techniques  described  herein,  into  this  expression  yields  upper 
bounds  on  s  which  are  often  tighter  than  the  bound  of  Wolf.  Arranging  the 
coordinates  of  C  in  alignment  with  its  concurring-sum  structure  also  yields 
low  values  of  s  in  all  the  primitive  binary  BCH  codes.  Some  of  the  bounds 
on  s,  resulting  from  the  direct-  and  concurring-sum  structures,  are  listed  in 
the  table  below.  The  table  also  lists  the  complexity  of  decoding  the  priirutive 
binary  BCH  codes  using  the  proposed  techniques,  as  compared  to  the  com¬ 
plexity  of  the  the  conventional  decoders  (Viterbi  decoding  based  on  the  trellis 
of  Wolf  [6]  for  high-rate  codes,  and  Fast  Hadamard  Transform  decoding  [1]  for 
low-rate  codes).  These  figures  are  given  in  terms  of  the  number  of  real  operas 
tions  per  bit  of  information.  The  computational  gain  obtained  reaches  several 
orders  of  magnitude  in  many  cases.  For  instance  for  the  (64,30,14)  extended 
BCH  code  the  proposed  techniques  are  about  1,000  times  more  efficient. 


j  Code 

Wolf 

bound 

DS  and  CS 
structures 

Lower 

bound 

Conventional 

decoding 

Pioposed  1 
techniques  || 

BCH[8,4] 

4 

3 

3 

16 

6 

BCH[16,11I 

5 

4 

4 

66 

26 

BCH[16,7] 

7 

6 

5 

128 

42 

BCH[16,5] 

5 

4 

4 

32 

13 

BCH[32.26] 

6 

5 

5 

160 

66 

BCH[32,21] 

11 

10 

9 

3413 

1094 

BCHi32,16i 

16 

9 

9 

20480 

251 

BCH[32,11] 

11 

10 

9 

2048 

398 

BCH[32.6] 

6 

5 

5 

64 

27 

BCH[64,57] 

7 

6 

6 

3.48  10* 

1.32  ■  10* 

BCH[64,51] 

13 

12 

11 

1.91  ■  lO’ 

6.76-10* 

BCHi64,45j 

19 

14 

11 

9.67  ■  10“ 

2.19  10* 

BCH[64,39] 

25 

20 

11 

4.04  ■  10' 

8.81  ■  10® 

BCH|64,36j 

28 

19 

15 

2.16  10* 

3.77  ■  10® 

BCH[64,30] 

30 

21 

16 

6.08  10* 

6.06  10* 

BCH|64,24i 

24 

16 

14 

1.68  •  10' 

1.96  ■  10* 

BCHi64,181 

18 

17 

16 

2.62  10* 

3.37  10* 

BCH[64,16] 

16 

15 

14 

6.55  10* 

1.16  10* 

BCH[64,10j 

10 

9 

9 

1.02  •  10* 

4.61  ■  10* 

BCH[64,7] 

7 

6 

6 

1.28  10* 

5.49  10* 
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Abstract 

We  propose  a  Reed-Sok>mon  code  decoding  algorithm  based  on 
Newton’s  mterp<dation  to  speed  up  Generalized-Minimnm-Distance 
(GMD)  decoding.  This  algorithm  nses  a  modified  Berlekamp-Massey 
algorithm  to  perform  all  necessary  GMD  decoding  steps  in  only  one 
run.  The  solutions  generated  by  a  Berlekamp-Massey  algorithm  if  i 
least  rebable  symbols  are  erased  are  used  to  generate  the  solutions 
for  2  erasures  less.  By  then  using  a  time  domain  decoder  the  over¬ 
all  asymptotic  GMD  decoding  complexity  becomes  0(dn)  with  n  the 
length  and  d  the  distance  of  the  code.  It  can  be  shown  that  this  GMD 
decoding  complexity  is  asymptotically  minimal. 


Summary 


Up  to  now  the  coding  and  decoding  of  Reed-Solomon  codes  is  based 
on  the  Fourier  transform.  The  approach  proposed  here  uses  Newton’s 
interpolation.  To  use  interpolation  for  coding  was  already  proposed 
by  Mandelbaum  [4]  back  in  1979.  Newton’s  interpoUtion  has  the 
advantage  that  if  one  wants  to  add  a  new  interpolation  value  then 
only  one  additional  coefficient  has  to  be  calculated.  We  use  this  for 
GMD  decoding. 

We  assume  that  the  distance  of  the  Reed-Solomon  code  is  odd 
{d—2t+l)  and  wj.o.g.  that  the  Reed-Solomon  codewords  over  GF(9} 
are  defined  by  the  evaluation  of  polynomials  of  degree  less  or  equal 
n-1— 2f  with  n<q.  (I.e.  the  generator  polynomial  has  all  zeros  at 
the  highest  locations.) 

Let  V»(z)  =  _,•)(*  -  *j)  be  the  erasure  locator  prdynomial 

eraring  the  least  reliable  2(t-t)  locations.  We  get  the  key  equation 
on  erasing  these  locations  [2]: 

»«(*)A,(*)f;(*)  =  (*»-i)n.(*)  (1) 


with  £(z)  the  transform  of  the  error  vector,  Ai(z)  the  error  locator 
prdynomial  and  n;(z)  the  error  value  polynomial.  Let  £(z)  be  given 
in  Newton  coefficients: 

m  = 

isl  jseO 

jmO 


Le.  5(*)  =  5b  +  EILT*  -  ».)  «nd  5;  =  Note 

that  for  a  received  word  R(x)  =  E{x)  +  C(x)  and  that  then  5(x)  is 
a  (known)  Newton  syndrome.  With 


_JmdEL. 


(2) 


and  as  9o(x)  =  (z”  -  l)/TI^^~*(z  -  zj)  the  snbproblems  of  GMD 
decoding  become:  If  2(t-«)  least  reliable  locations  are  erased  find  the 


polynomial  A;(z)  of  smallest  degree  stdving 

5(*)A.(*)  =  (3) 

with  deg(Ai(z))  :>  deg(  Ai(z))  (see  (2)).  This  solution  is  then  neces¬ 
sarily  unique  up  to  a  constant  factor. 

If  we  have  solved  (3)  we  wish  to  solve  the  next  problem  (i-fl-»i) 
using  the  old  solution. 

However,  we  do  not  only  need  the  minimal  solution  of  (3)  but  also 
another  second  solution. 

We  can  show  the  following: 

Let  Af  (z),  deg(Af(x))=d  be  the  minimal  polynomial  scdving  (S)  and 
Af  (z)  of  degree  Si— 1+1  be  another  solution  of  (S)  that  is  not  divided 
by  A^(x).  Then  the  nonzero  polynomials 

Ai,(z)  =  A(z)A-(z)-B(z)A.n*)  (5) 

with 

B(x)  =  gcd[t,.(z)/*,.«(z),n.-(z)]  (6) 
and  A(z)  the  minimal  solution  of 

A(z)n,-(»)-B(z)fi,t(z)  =  ^«ig-n+,(*)  (7) 

solve  (S)  with  it-i+1.  One  of  them  is  minimal  with  degree  k.  The 
other  has  degree  g(i+l)—k+l  and  it  is  not  divided  by  the  minimal 
solution. 

This  proves  that  there  exists  an  algorithm  that  solves  the  GMD  de¬ 
coding  problem  in  only  one  run.  By  transferring  this  algorithm  into 
the  time  domain  as  in  [2]  the  overall  complexity  of  the  algorithm 
becomes  0{nt). 

It  is  easy  to  see  that  this  asymptotic  complexity  is  minimal:  Even 
if  there  existed  an  algorithm  that  generates  the  GMD  list  without 
any  operation,  only  the  operation  to  search  for  the  nearest  codeword 
would  already  take  0{nt)  operations.  Thus  is  cannot  be  better. 
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Abstract 

Suboptimal  decoding  of  linear  codes  in  ’’symmetric”  memoryless  chan¬ 
nels  is  considered.  For  the  9- ary  codes  of  length  n  — >  00  and  code  cate 
R  the  number  of  decoding  operations  is  upper  bounded  by  the  value 
^n(c-fo(i))^  where  o(l)  ->•  0  and  c  =  min(A(l  -  fl),(l  -  R)/2).  The 
decoding  error  probability  c  is  upper  bounded  by  the  double  error 
probability  of  maximum  likelihood  (ML)  decoding,  while  c  ~  Cc, 
when  n  —  00.  For  the  channels  with  discrete  (quantified)  output  the 
better  estimate  c  =  Ji(l  -  R)/(l  -1-  R)  is  obtained. 


Suboptimal  decoding 

Wolf’s  trellis  algorithm  [1]  provides  ML-decoding  for  all  linear  codes 
in  memoryless  channels  with  decoding  complexity  ,  where 

n  — >  00,  o(l)  -+  0  and  the  exponent  c  =  min(R,  1  -  ft).  Below  we  con¬ 
sider  the  decoding  of  linear  codes  in  "symmetric”  memoryless  channels 
with  similar  correcting  capacity  and  smaller  complexity.  These  chan¬ 
nels  include  as  examples  the  discrete  symmetric  memoryless  channels 

[2],  AWGN-channel  or  the  memoryless  channel  with  2-dimensionai 
wight  Gaussian  noise  and  q-PSK  modulation.  We  consider  also  the 
complete  minimum  distance  (MD)  decoding  algorithms  and  construct 
the  corresponding  suboptimal  coverings  with  polynomial  complexity. 

Consider  a  channel  with  a  discrete  set  X  of  Q  inputs  and  an  arbitrary 
output  set  Y  (|y|  <  00).  Let  Py|x(!/|*)  define  the  probability  measure 
for  each  x  €  X.  For  any  finite  output  set  Ya  C  Y,  IVal  =  <fa,  consider 
{Q  X  Ja)-ni^ftix  Pa  =  P(y\x),x  &  X,y  €  Vo,  using  inputs  as  rows 
and  outputs  as  columns.  Following  [2J,  the  channel  with  an  arbitrary 
output  set  V  is  defined  to  be  symmetric  if  V  can  be  partitioned  into 
disjoint  finite  subsets  Vo,  V  =  UoVo,  in  such  a  way  that  in  any  matrix 
Pa  each  row  is  a  permutation  of  each  other  row  and  each  column  (if 
more  than  one)  is  a  permutation  of  each  other  column. 

For  any  output  y  €  V  order  all  Q  elements  i  €  X  into  (any)  set 
Xy  =  {*(l),x(2),...,i(Q)},  where  /’(y|i(i))  >  /’(y|i(«  +  1))  for  all 
i  =  1, . .  .,Q  -  1.  Let  N(x}  denote  the  number  of  the  vector  2  €  X  in 
the  ordered  set  X,.  Let  any  subset  A  of  5  =  |y4|  inputs  be  used  with 
equal  probability  1/5.  For  any  received  output  y  €  V  let  D(y)  be  the 
ML-decoding  solution:  D{y)  =  x'  ^  A  :  N{x’)  <  Nix),  Vi  €  A,x  yt 
x'  For  the  given  subset  >1  of  M  inputs  define  the  decoding  algorithm 
Dm  by  the  following  rule: 


Duiy)  = 


{ 


0 


if  N(x')  <  M 
otherwise. 


Let  c(M)  be  the  error  probability  of  Dm-  Obviously,  r((})  =  c,  is  the 
probability  of  ML-decoding.  The  following  theorem  generalizes  lemma 
1  [3]  and  gives  an  upper  estimates  on  f(A/)  ht  M  >  N,N  =  \Q/S]. 


Theorem  1  The  error  probabUity  of  DM-decoding  in  any  symmetric 
channel  can  be  upper  estimated  for  any  M  =  iN,  i  =  2, . . . ,  S  -  1  as 
f(M)  <<.  +  «./(»■ -1). 


The  following  theorem  generalizes  algorithms  [3],  [4]  for  the  suboptimal 
soft  decision  decoding  in  symmetric  memoryless  channels. 

Theorem  2  Virtually  all  q-ary  linear  codes  of  length  n  — •  00  and  code 
rate  R,0  <  R  <  I,  can  be  decoded  in  memoryless  symmetric  channel 
with  error  probability,  which  is  equivalent  to  the  error  probability  of 
ML-decoding,  and  complexity  k  =  when  o(l)  -*  0  and  c  = 

min(R{l-fl),(l-fl)/2). 


Moreover,  the  complexity  exponents  e  =  Ril-R)  and  c  =  ( 1  - fi)/2 
are  valid  for  all  linear  cyclic  codes  and  for  all  linear  codes  respectively. 
Suboptimal  decoding  can  also  decrease  the  complexity  for  the  short 
code  lengths.  For  example,  the  binary  (24,12)  Golay  code  can  be 
decoded  using  the  most  reliable  64  trellis  nodes  on  two  information 
sets  of  the  first  12  positions  and  the  last  12  positions.  Note,  that  the 
complete  trellis  diagram  includes  4096  nodes. 

Theorem  3  Virtually  all  q-ary  linear  codes  of  length  n  — »  00  and  code 
rate  R,0  <  R  <  1,  can  be  decoded  in  memoryless  discnte  symmetric 
channel  with  error  probability,  which  is  equivalent  to  the  error  proba¬ 
bility  of  ML-decoding,  and  complexity  k  =  when  o(l)  — *  0 

otuf  c  =  R{\  —  R)j[l  -i-  R). 

Suboptimal  coverings 

The  known  information  set  decoding  algorithms  are  found  on  subopti- 
mal  coverings  in  Hamming  metric.  These  coverings  can  be  constructed 
by  random  search  [5]  and  provide  asymptotically  e  Ce,  when  n  -»  00. 
We  construct  suboptimal  coverings  with  polynomial  complexity,  pro¬ 
viding  therefore  the  complete  minimum  distance  decoding  with  the 
error  probability  Ce  and  the  same  complexity  exponent. 

Let  5(n,f)  be  the  set  of  vectors  of  Hamming  weight  t  in  Fj.  A 
.••ubset  of  S{n,t)  is  called  a  covering  T(n,t,l),  f  >  /,  if  any  vector 
in  5(n,l)  is  covered  by  some  vector(s)  from  T{n,f,f).  We  call  the 
covering  T{n,t,l)  suboptimal  if  it  has  the  lowest  exponential  order: 
log,  |T(n,f,/)|  ~  log,(  (7) /({)),  when  n  —  00.  If  the  covering  vector  in 
T(n,t,l)  can  be  constructed  in  polynomial  time  c(n)  for  any  vector  in 
S(n,l),  we  call  T{n,t,l)  a  polynomial  covering  of  complexity  c(n). 

Theorem  4  Suboptimal  covering  T{n,t,l)  of  complexity  O(nlog2n) 
can  be  constructed,  when  n  —  00,  t  =  an,i  =  /3n,0  <  5  <  o  <  1. 

Conclusion 

The  minimum  distance  decoding  algorithms  of  the  papers  [3],  [4]  are 
generalized  for  the  suboptimal  decoding  in  an  arbitrary  memoryless 
channel,  providing  the  new  estimates  of  the  soft  decision  decoding 
complexity. 
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AngntArr 

A  soft  decision  decoding  nlgotilhm  for  anay  codes,  nsing  less 
compuiaiioa  and.  with  better  perfonnance  than  a  previous  algorithm,  is 
introduced.  The  new  algorithm  uses  received  symbol  hard  decisioo  and 
coaBdence  values  to  optimise  which  sub-oodes  are  selected  for  full  soA- 
minimum-disumce  decoding,  and  comets  mote  error  patiems  than  the 
previous  algorithm. 


SUMMARY 

A  number  of  soft  dedsion  algorithms  exist  which  aim  to 
reduce  the  number  of  computations  required  to  perform 
Maximum  Likelihood  soft  decision  decoding  (MLSDD)  with  as 
little  loss  in  performance  as  possible.  They  are  designed, 
according  to  different  purposes  and  applications,  to  minimise 
either  the  symbol  error  rate  or  the  codeword  error  rate. 

A  soft  decision  decoding  algorithm  for  array  codes  is 
introduced  in  this  paper.  By  taking  advantage  of  the  array  code's 
characteristics,  this  algorithm  is  designed  to  optimise  decoding  of 
the  array's  sub-codes,  with  overall  perfonnance  which  is  close  to 
MLSDD  under  certain  conditions.  As  an  improved  scheme  from 
the  original  proposals  (1][2],  it  decodes  only  selected  sub-codes 
instead  of  ail  the  sub-codes  of  an  array  code,  and  also  has  the 
ability  the  old  algorithms  lack  to  correct  full  hard  errors. 


CoacetH  Of  New  Altorahm: 

It  is  possible  to  reduce  the  number  of  soft  decision 
computations  required  due  to  the  faa  that  the  distribution  of  the 
number  of  errors  within  one  codewwd  is  dynamic,  and  the 
number  of  errors  is  very  small  compared  with  the  full  code  array 
size.  There  always  exist  some  sub-codes  of  the  code  with  few 
errors  or  error  free.  We  can  begin  by  applying  full  soft  decision 
decoding  on  the  sub-code  with  the  fewest  errors.  After  being 
decoded  successful,  all  symbols  of  this  sub-code  are  considered 
to  have  the  highest  confidence,  and  are  used  to  decode  other  sub¬ 
codes.  The  procedure  continues  until  all  symbols  have  been 
involved  rather  than  all  sub-codes  have  been  decoded.  Fa*  the 
error  free  subcodes,  of  course,  soft  computation  is  not  needed. 

Decodint  Altoriihm- 

The  new  efficient  soft  decision  decoding  algorithm  is  stated 
as  below  (for  simple  two-cooidinate  parity  check  array  codes): 

1.  Compute  the  row  sub-code  and  column  sub-code 
confidence  sums. 

2. Compute  the  syndrome  for  each  sub-code  using  the  bard 
decision  algorithm.  Sub-codes  with  syndrome  *0' and  syndrome 

are  labeled  'matched'  and  unmatched',  respectively. 

3. Ra)e  out  subcodes  with  full  confidence  sums  and  the 


'matched'  sign,  i.e..  satisfying. 


I 


iog[ 


-0 

Pir/l), 


-»max 


These  sub-codes  do  not  need  soft  decision  decoding,  and  their 
symbols  are  taken  u  correct 


4J)ecode  the  sub-codes,  using  full  minimum  soft  distance, 
Le.,  in  the  following  order  high  confidence  -i-'matched'  — > 
high  confidence  -f ‘unmatched'  — >  low  confidence  -i-'matcfaed'  - 
->  low  confidence  -(-'unmatched'.  (  if  both  factors  are  sanne 
choose  the  one  with  the  larger  number  of  rows/columns) 

SAfter  decoding  each  sub-code,  re-calculate  confidence  sums 
of  related  subcodes  and  change  syndrome  signs  if  necessary. 

6Jtiq>eat  3, 4,  and  S  until  all  symbols  in  the  array  have  been 
involved  in  the  decoding  procedure. 

Performance  Xlmnlniifm,- 

The  simulatioos  assume  a  binary  input  8-ary  output  AWGN 
channel.  The  results  obtained  for  the  (16,  9)  array  code  are 
shown  in  die  following  figure. 


Eb/No 

Due  to  the  ability  of  being  able  to  cmiect  full  hard  emxs  and 
its  other  advantages,  the  new  algorithm  has  a  lower  etTm  rate 
with  reflect  than  that  of  the  old  algorithm. 

With  regard  to  decoding  complexity,  the  new  algorithm 
reduces  the  number  cd  soft  decision  computations  required  over 
the  old  algorithm  to  a  fraction  of  0.5  or  less.  Alternatively,  a 
more  powerful  code  can  be  used  with  no  increase  in  complexity. 
The  new  algorithm  can  be  developed  for  use  with  mote  complex 
array  codes  and  other  types  of  error-control  code. 
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Introduction 

We  consider  Generalized  Minimum  Distance  (GMD)  de¬ 
coding,  proposed  by  Forney  in  [1].  Let  a  received  vector 
r  and  an  ordering  of  the  positions  in  r  according  to 
some  reliability  information  be  given.  One  of  the  key 
problems  in  GMD  decoding  is  to  decode  a  collection  of 
vectors  rj  obtained  from  r  by  erasing  more  and  more 
positions.  We  solve  the  problem  of  finding  error  and 
erasure  positions  in  different  r,  efficiently  by  using  rela¬ 
tions  between  the  decoding  branches  correcting  different 
numbers  of  erasures. 

Error-Erasure  Location 

Given. a  linear  code  C  and  a  received  vector  r  we  want 
to  correct  i  errors  and  p  erasures.  Formally  erasure  po¬ 
sitions  are  set  to  zero.  An  error-erasure  location  for  C 
can  be  done  with  a  so  called  error  locating  pair  of  vec¬ 
tor  spaces  (U,V)  ([3],p.347).  The  dimension  ku  of  U 
is  greater  than  t  •+■  p,  the  minimum  distance  dv  of  V  is 
greater  than  t  and 

V*V^CC^,  with 

U*V^  =.  {(«lVl,«2V2.-  -,«nVn)  :  «  €  t/,  v  €  V-^}. 

Let  Gu  be  a  generator  matrix  for  U,  Hv  ^  parity  check 
matrix  for  V  and  let  diag(r)  be  the  diagonal  matrix 
containing  r  in  its  diagonal.  The  first  step  in  finding 
an  error-erasure  locating  vector  is  to  find  the  space  of 
solutions  a  to  the  key  equation 

r<r^  =  0,  <r€JF^‘'  where  F  =  Nvdiag(r)Gl^. 

Denote  this  space  by  £.  Using  £  we  find  a  subspace 
of  [/  spanned  by  vectors  <tGv,o-  €  E.  We  denote  this 
subspace  by  W.  Finally  we  restrict  W  to  the  space  of 
vectors  which  are  zero  in  erasure  positions  thus  yielding 
the  space  of  vectors  which  locate  errors  and  erasures 
with  zeros. 

Error-Erasure  Location  in  GMD  Decoding 

In  the  case  of  GMD  decoding  we  have  a  collection 
of  error  erasure  locating  pairs  correcting  U 
errors  and  pi  erasures.  We  assume  Ui  C  t^i+i  and 
Vi  C  Vj+i-  We  then  can  write  containing  Gu, 

in  the  first  ku,  rows.  In  the  same  way  Hy,  is  contained 
in  the  first  n  —  tv,  rows  of  The  corresponding 

set  of  key  equations  is 

Fjff^  =  0,  <T  6  where  F,-  =  Hv,diag(r)Gu^. 


Each  key  equation  has  a  space  of  solutions  denoted  by 
E,-.  In  a  first  step  we  obtain  the  spaces  Ej  and  the 
corresponding  spaces 

Wi  =  {u  ;  u  =  <tGu„o  €  E,} 

described  by  generator  matrices  Gw,-  In  a  seccxid  step 
we  find  the  subspaccs  of  Wi  which  are  zero  in  the  desired 
erasure  positions. 

We  define  a  matrix  F  by 

f  =  Hvidiag(r)Gjj, 

and  denote  the  submatrix  of  F  consisting  of  the  elements 
in  the  first  a  rows  and  first  6  columns  as  The 

following  relation  holds: 

p(n- 

The  efficiency  of  the  proposed  procedure  resides  from 
the  fact  that  we  can  find  the  solution  spaces  of  all  key 
equations  by  dealing  with  only  one  matrix  F.  We  apply 
to  F  a  slightly  modified  version  of  the  fundamental  itera¬ 
tive  algorithm  (FIA)  proposed  by  Feng  and  Tseng  in  [2] 
which  gives  us  the  spikes  Ei.  The  E,-  satisfy  E,..i  C 
2uid  so  Wi_i  C  Wi.  We  find  a  generator  matrix  Giv, 
containing  a  generator  matrix  for  Gwi_,  in  the  leading 
rows.  The  second  task  is  now  to  find  the  subspaces  of 
Wi  that  are  zero  in  the  desired  erasure  positions.  This 
is  done  by  applying  simple  row  operations  to  the  matrix 

Gw,- 

We  have  found  an  efficient  procedure  to  obtain  error- 
erasure  positions  in  every  branch  of  a  GMD  decoding 
scheme.  The  asymptotic  complexity  of  this  procedure  is 
in  the  general  case  given  by  G(n®). 
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In  this  paper,  we  will  provide  an  efficient  algorithm  for  GMD  (Gen¬ 
eralised  Minimum  Distance)  decoding  in  which  an  algebraic  errors-and- 
erasures  decoding  procedure,  the  W-B  method,  is  required  to  execute  only 
one  time,  whereas  in  a  conventional  GMD  decoding  at  most  [d/2j  times 
algebraic  decoding  must  be  necessary,  (diminimum  distance  of  code) 

T  neneiwlieeH  eyndrome  Dolvnninial  for  errors-and-erasures 


We  let  R{z)  a  received  word  polynomial,  C{i)  a  code  word  polyno¬ 
mial,  E(z)  an  error  polynomial,  e(s)  an  erasure  polynomial  and 
D{z)»E(z)+e(z)  an  errata  polynomial,  which  have  the  following  relation. 

R[z)  -  C(z)+E{z)+e{z)  -  C(s)-t-D(z)  (1) 


Here  we  will  propose  a  generalized  syndrome  polynomial  S(z) 
5(z)  - 

1.0  *  -  “ 


I  follows; 
(2) 


where  T(z)  is  an  ariitrary  (d— l)th  degree  polynomial  and  Ri  is  the  »-th 
coefficient  of  R{z)  and  the  generator  polynomial  is  given  by 

G{z)  -  (s-o*)(z-flt*^*)  •  •  •  i.z-a^*-')  .  (3) 


When  T(z)-z^"',  S(z)  becomes  a  conventional  syndrome  polynomial. 
By  (1)  and  (2)  we  get  the  following  relation. 


5(z)  -  S 

iefo)  *  “  “ 

where  {D}  is  a  set  of  indices  of  errata  location.  From  (4),  we  have  the 
lowing  key  equation  for  decoding; 


(<) 

fol- 


<T(z)5(z)-q(z)T(z)-M*)  (5) 

where  <r(z)-  JI  (z-a‘)-  JJ  (*“<•*)  11  (*-<*‘)“<^s(*W*)  (**) 

K{D)  ieW  i€f«) 

q(z)-  S  n  S  n  (s-a^){6b) 

iefP)  >€{0)  j€fO) 


n.  Reduced  key  equation 

We  will  modify  the  key  equation  (5).  At  first,  a  set  of  indices  { T} 
and  a  polynomial  o-^Iz)  are  defined  such  that 

snd  T(a>0 , 16(0,1 . n-l)}  (7) 

n  U-o')  (*) 

iefr) 

T(z)  and  <rj^z)  are  known  polynomials  for  the  receiver,  so  that  <Tj{z)  is 
also  known  before  the  decoding  process.  By  (5),  iv(z)  has  also  a  factor  of 
<rj(z)  ,  as  for  <r(z)  and  r(z),  so  that  we  have 

if{z)S{z)<rri{z)f{z)+<Hz)  (9) 

where  a{z)-d{z)<ri{z),  7’(r)-f(z)<r7<s),  ij{z)-u{z)aj(z)  (10) 

We  will  call  (9)  a  reduced  key  eguation.  When  the  solution  of  d(z) 
and  w(z)  are  obtained,  we  can  also  calculate  the  errata  value  as  follows; 

For  T(a>0  A— l<i{o‘)/<^'(<»‘)l  /  («) 

For  T{a‘)-0  E,-[S(a‘)-<l>' (a‘)/<f  ' (a‘)]  /  [T' (a‘)a*l  (12a) 

e;-|S(o‘)-«  (<»')/*?  (“Ol  /  (I"  (“V*!  (‘2b) 

m.  AonUcation  of  the  W-B  method 

We  will  apply  the  W-B  method  to  solve  (9).  For  the  index  i  such 
that  t(a*)-0,  we  have  from  (9) 


This  work  was  partly  supported  by  the  Telecommunications  Advancement 
Foundation. 


d(a’)S(a’)-w(a‘)  (13) 

The  W-B  method  is  a  kind  of  the  iterative  rational  interpolation  method  in 
which  ui(z)  and  <f(z)  arc  numerator  and  denominator  polynomial  of 
rational  fimetion,  respectively,  as  well  as,  a‘  and  S(a')  corresponds 
prescribed  sampling  point  and  sampled  value  ^  In  order  to  stress  an 
iterative  meaning,  we  will  write  ^*^(z)  and  u‘*‘(z)  by  the  iterative  index  k, 
which  means  the  k-th  solutions,  Le.  they  satisfy 

U-h,  ■  •  .  .4-.)  (14) 

where  f(cri)-0  for  i6{>o>  .  .  .  ,<t>  •  -  .  iIm-i}  sud  m  means  the  number  of 
roots  of  ^z). 

IV.  Efficient  algorithm  for  GMD  deendins 

In  the  Forney's  procedure  (GMD  decoding)  (d-1)  most  unreliable 
received  sjrmbols,  are  selected  with  decreasing  reliability 

order,  i.e.,  9.j>9.  >  •  •  •  Thus,  by  using  an  arbitrariness  of  T(z),  we 
will  choose  T(z)  such  that 

r(z)-(z-<z^(z-a‘^  •  •  •  (*~a'**)  (15) 

For  the  (d-l-k)  erasures  and  [k/2j  errors-decoding,  the  erasure  locations 
are  (it,  .  .  .  ,>s-a).  The  errata  v^ues  can  be  calculated  by  (11)  or  (12). 
Note  also  that  dl*^(z)  it  actually  an  error  location  polynomial  with  Lk/2j 
errors  and  is  not  including  an  errata  portion.  Following  Forney’s  pro¬ 
cedure,  we  can  derive  an  efficient  algorithm  for  GMD  decoding.  This  pro¬ 
cedure  is  schematically  shown  in  Fig.l. 


Fig.!  Flow  chart  of  sn  efficient  GMD  decoding  algorilliffl 


V.  Concludinx  Remarks 

In  our  algorithm,  the  GMD  solution  can  be  found  only  by  one  pass 
using  the  W-B  method. 

References 

(1)  G.D.Fomey,  Jr.,  "GeneraHied  minimum  distance  decoding,” 

Trans  IT-12,  pp.135-131,  April  1988 

(2l  L.R.Welch  and  E.R.Berlekamp,  "Error  correction  for  algebraic  block 
codes,”  presented  at  St.  Jovites  ISrT’83,  1982 
(3]  K.Araki  and  I.FqJiU,  "Generalised  syndrome  polynomials  fiw  decoding 
Reed-Solomon  codes,”  lElCE  Trans.  FundamentaU  ,  pp.  1028-1029, 
August  1992 


34 


A  Family  of  BCH  Codes  for  the  Lee  Metric 


Ron  M.  Roth*  Paul  H.  Siegel^ 


Let  C(n,r;p)  be  the  (shortened)  BCH  code  of 
length  n  over  GF{p)  with  a  parity-check  matrix 
[a*][ro,’7=i  5  where  the  a/s  are  distinct  nonzero  ele¬ 
ments  of  the  smallest  field  GF{p"^)  of  size  greater 
than  n.  The  minimum  Lee  distance  of  C{n,r;p) 
will  be  denoted  by  dc(n,r;p). 

Theorem. 


dc{n,r\p)  > 


for  r  <{p—  l)/2 
for  r  >  (p -I- 1)/2 


Comparing  the  codes  C{n,r\p)  with 
Berlekamp’s  negacyclic  codes,  it  follows  that  the 
theorem  yields  the  same  lower  bound  on  the  min¬ 
imum  Lee  distance  as  that  of  extended  negacyclic 
codes;  however,  given  p,  r,  and  redundancy,  the 
maximal  length  of  the  codes  C(n,r;p)  is  twice 
as  large  as  that  of  their  negacyclic  counterparts. 
Furthermore,  the  decoding  algorithm  of  C{n,  r;  p) 
appears  to  be  simpler  than  Berlekamp’s  decoding 
algorithm  for  the  negacyclic  case. 

For  fixed  p  and  r,  the  codes  C(p”*  —  l,r;p) 
approach  the  sphere-packing  bound  on  the  mini¬ 
mum  Lee  distance  as  m  tends  to  infinity. 

When  n  <  p  —  1,  the  codes  C(n,r;p)  become 
(generalized)  Reed-Solomon  codes  and  the  lower 
bound  in  the  theorem  can  be  improved  to 


dc(n,r;p)  >  2r  ,  (1) 

which,  for  r  >  |p,  can  further  be  improved  to 


dc(n,r;p)  > 


r  +  1  (r  +  1)^ 

2  4(p  -  1  -  r)  ■ 


The  codes  C(n,r;p)  have  an  efficient  decoding 
procedure,  based  upon  Euclid’s  algorithm,  that 
corrects  all  errors  up  to  Lee  weight  r  -  1  and 
detect  all  errors  of  Lee  weight  r  whenever  the  2r 
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lower  bound  applies.  An  error  pattern  in  the  Lee 
space  is  viewed  as  axlditions  of  -fl’s  and  —  I’s  at 
(not  necessarily  distinct)  entries  of  the  codeword. 
The  positive  (respectively  negative)  error-locator 
polynomial  <r'*'(x)  (<7"(z))  is  the  product  of  terms 
l—OjX  that  correspond  to  all  location  j,  counting 
multiplicity,  in  which  Lee  errors  of  the  -|-l-type 
(— 1-type)  have  occurred.  A  key  ingredient  in  the 
decoding  algorithm  of  C(n,r;p)  is  computing  a 
polynomial  ^(x)  which  is  congruent  modulo  i’’  to 
the  error-locator  ratio  p(x)  =  <r’^(x)/er~(x).  The 
polynomial  d>(x)  is  computed  using  the  equality 

p(i)5(x)  +  xp'(i)  =  0, 

taken  modulo  x’’,  where  5(x)  stands  for  the  syn¬ 
drome  polynomial.  The  polynomials  o‘*(x)  are 
then  obtjuned  by  applying  Euclid’s  <ilgorithm  on 
^(x)  and  x^. 

One  of  the  applications  that  motivated  this 
work  was  anadyzing  the  correction  capability 
of  spectral-null  codes  for  p£irtial-response  chan¬ 
nels  [1].  These  codes  can  be  modeled  as  sets 
C(n,r)  of  integer  vectors  satisfying  the 

equalities  E>=i  j’Cj  =  0  for  i  =  0, 1, . . . ,  r  -  1. 

The  2r  lower  bound  (1)  applies  also  to  C(n,r). 
In  particular,  the  bound  applies  to  codes  with  an 
rth-order  spectral  null  at  zero  frequency  [1].  Fur¬ 
thermore,  the  decoding  algorithm  for  C(n,r;p) 
can  be  adapted  to  the  codes  C(n,  r)  and  thus  can 
be  used  in  the  scheme  suggested  in  [1]  for  improv¬ 
ing  the  reliability  of  information  transmission  in 
noisy  partial-response  channels  by  matching  the 
spectral  nulls  of  the  codes  with  those  of  the  chan¬ 
nel. 
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1.  Introduction 

lb  meet  an  increasing  demand  for  high  speed  communication,  high  density 
recording  etc.,  the  use  of  multilevel  signaling  has  been  widely  considered  and 
developed.  In  those  systems  employing  multilevel  signaling,  it  is  important 
to  develop  non-binary  error  control  codes  for  improving  the  reliability  of 
the  systems.  Although  binary  codes  based  on  the  Hamming  metric  have 
attracted  much  attention  of  most  researchers,  not  a  lot  has  been  investigated 
on  non-binary  error  control  codes. 

It  is  well  known  that  the  Lee  metric  is  suited  to  multiple-valued  systems 
such  as  communication  systems  employing  multi-phase  shift  keying.  etc.[l]. 
Since  it  is  obvious  that  the  Lee  distance  is  not  less  than  the  HaiH'  ing  dis¬ 
tance,  in  order  for  a  non-binary  code  to  be  used  as  an  error  control  code  in  a 
multiple-valued  system,  it  is  necessary  for  a  code  to  have  a  larger  minimum 
Lee  distance  than  the  minimum  Hamming  distance. 

In  this  paper,  we  study  the  minimum  Lee  distances  of  generalized  Reed- 
Muller  (GRM)  codes[2,  3].  (Complete  solution  is  given  on  the  minimum 
Hamming  distance[4].) 

A  GRM  code  is  defined  as  follows.  Denote  by  the  set  of  polynomials 
over  GF(p)  with  m  variables  Xi , . . . ,  Xm  and  the  total  degree  not  greater 
than  i>.  Also  denote  by  fc  =  the  expression  of  an  integer  k 

in  the  number  system  with  radix(base)  p,  i.e., 

m 

k  =  '^kip'~^,  0<ki<p. 

i=l 

In  the  following,  we  regard  ki'a  (i  =  1, 2, . . . ,  m)  as  elements  of  GF{p). 

DeflnUloii  1  For  /  =  /(Xj . X„)  e  let 

Ck  =  f{k)  =  f(k„ki . fc„.)  (0  <  k  <  p") 

and  define  Cy  and  Cy  by 

cy  =  (ci,cj,...,cp-i-i),  =  (co,Ci . Cp-.-i). 

Then  the  iz-th  order  generalized  Reed-Muller  (GRM)  code  C  with  code  length 
p™  -  1  and  the  K-th  order  extended  generalized  Reed-Muller  (e-GRM)  code 
C*  with  code  length  p"*  are  defined  by(4,  5] 

C  =  {cy|/€P„,„},  C‘  =  {c)|/en,„}. 

2.  Main  Theorems 

Our  main  results  are  as  follows. 

Theorem  1  A  lower  bound,  of  the  minimum  Lee  distance  of  the 

ir-th  order  GRM  code  with  code  length  p"*  -  1  is  given  by 

^Imin  =  (1) 

where  <^<1  <^iAi  denote  the  minimum  Lee  distance  of  the  v-th  order 
GRM  code  with  code  length  p  -  1  and  the  minimum  Lee  distance  of  the  ir-th 
order  e-GRM  code  with  code  length  p,  respectively.  5 

Theorem  1  implies  that  a  lower  bound  of  the  minimum  Lee  distance  of  the 
(/-th  order  GRM  code  with  code  length  p”*  - 1  (m  >  1)  can  be  obtained  only 
from  the  minimum  Lee  distances  of  the  cor.-esponding  n-th  order  GRM  code 
and  e-GRM  c<  de  with  m  =  1. 

We  also  give  the  true  minimum  Lee  distances  for  special  classes  of  GRM 
codes. 

Theorem  2  The  minimum  Lee  distance  of  the  first  order  GRM  code 
with  code  length  p"*  -  I  is  given  by 

^Lmin  “  P  “  1  >  ^Hmin 

where  denotes  the  minimum  Hamming  distance  of  the  code.  f 

Theorem  3  The  minimum  Lee  distances  of  the  (p  -  2)-th  and  (p  -  l)-th 
order  GRM  codes  are  given  by 

(p  -  2)-th  order  :  di„i„  =  2p"’-'  -  1  (=  djr™„) 

(p  -  l)-th  order  :  di„i„  =  p"*  ‘  -  1  (=  dg„,„) 

and  both  equal  their  minimum  Hamming  distances.  5 


3e  Numerical  Examples 

Since  the  true  minimum  distances  of  GRM  and  e-GRM  codes  with  shorter 
code  length  can  be  obtained  rather  easily  by  computer  search,  the  expressiim 
of  lower  bound  derived  in  this  paper  enables  us  to  get  a  lower  bound  of  the 
minimum  Lee  distance  of  a  GRM  code  having  a  longer  code  length. 

We  show  in  Tables  1  and  2  the  lower  bounds  of  the  minimum  Lee  distances 
for  GRM  codes  obtained  by  Theorem  1  together  with  the  minimum  HAmming 
distances  for  comparison.  From  Tables  1  and  2,  we  can  ^ee  that  there  are 
many  GRM  codes,  marked  by  f.  whose  minimum  Lee  distances  really  exceed 
the  minimum  Hamming  distances.  It  is  also  confirmed  that  the  lower  bounds 
shown  in  Tables  1  and  2  all  agree  with  the  true  minimum  Lee  distances,  some 
of  which,  marked  by  *,  are  obtained  by  TheorAms  2  and  3,  and  others  by 
finding  codewords  whose  Lee  weights  are  actually  equal  to  the  lower  bounds. 
Therefore  we  may  conjecture  that  S^ves  the  true  minimum 

Lee  distance  of  the  iz-th  order  GRM  code,  while  it  is  a  further  study  to  give 
a  rigorous  proof. 


Table  1:  Lower  bounds  of  minimum  Lee  distances  of  GRM  codes  for  m  =  2. 


order  1/ 

P 

5 

7 

11 

13 

17 

19 

(code  length) 

(24) 

(48) 

(120) 

(168) 

(288) 

(360) 

1 

■T  24 

■’  48 

•*  120 

•1  168 

•*  288 

•t  360 

dflmin 

19 

41 

109 

155 

271 

342 

2 

t  18 

'  48 

’  120 

'  168 

*  288 

<  360 

^Bmin 

14 

34 

98 

142 

254 

322 

3 

•9 

'  39 

f  120 

'  168 

*  288 

*  360 

^Bmin 

9 

27 

87 

129 

237 

303 

4 

^tmin 

•4 

T  26 

♦  120 

'  168 

'  288 

♦  360 

^B  min 

4 

20 

76 

116 

220 

284 

5 

^Lmin 

— 

•13 

*  105 

'  168 

'  288 

'  360 

djj  min 

3 

13 

65 

103 

203 

265 

6 

— 

•6 

'  85 

'  151 

'  288 

'  360 

dsmin 

2 

6 

54 

90 

186 

246 

Table  2:  Lower  bounds  of  minimum  Lee  distances  of  GRM  codes  for  m  s  3. 


order  p 

P 

5 

7 

11 

13 

17 

19 

(code  length) 

(124) 

(342) 

(1330) 

(2196) 

(4912) 

(6858) 

1 

•h24 

•T342 

•’1330 

•<21% 

•*4912 

•*6858 

djy  min 

99 

293 

1209 

2027 

4623 

6497 

2 

dunlin 

'  98 

'  342 

’  1330 

’  21% 

*  4912 

’  6858 

*^Jfmin 

74 

244 

1088 

1858 

4334 

6136 

3 

•  49 

’  291 

’  1330 

<  21% 

*  4912 

*  6858 

^B  min 

49 

195 

967 

1689 

4045 

5775 

4 

dfmin 

•  24 

'  194 

♦  1330 

’  21% 

’  4912 

’  6858 

*^Hmin 

24 

146 

846 

1520 

3756 

5414 

5 

— 

•97 

’  1205 

<  21% 

’  4912 

’  6858 

*ftfmin 

19 

97 

725 

1351 

3467 

5053 

6 

— 

•48 

’  965 

’  2023 

’  4912 

’  6858 

^Bmin 

14 

48 

604 

1182 

3178 

4692 

*  :  Also  obtained  by  Theorem  2  or  3. 
t  :  Exceeds  the  minimum  Hamming  distance. 
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Most  non-binary  error  correcting  codes  are  designed  for 
correcting  error  patterns  regardless  of  error  magnitudes  and  exhibit 
the  best  perfomance  when  error  magnitudes  are  equally  likely. 
However  in  non-binary  digital  transmission  links  with  fairly  good 
SNR  only  a  subset  of  error  magnitudes  occurs  with  not  negligible 
probability.  This  circumstance  suggests  to  search  for  codes  that  are 
able  to  correct  only  error  patterns  composed  of  errors  belonging  to 
the  said  magnitude  subset.  Such  codes  will  be  called  EMSC  codes 
(Error  Magnitude  Subset  Correcting  codes).  It  is  reasonable  to 
conjecture  that  f-error  correcting  EMSC  codes  exist  which  exhibit  a 
better  rate  in  comparison  with  all  possible  conventional  t-error 
correcting  codes  with  the  same  codeword  length. 

In  this  paper  a  class  of  Single  Error  correcting  EMSC  codes 
(SE-EMSC  codes)  is  obtained  and  an  efficient  decoding  procedure  is 
proposed.  The  above  mentioned  conjecture  is  proved  for  these  codes 
by  showing  that  they  have  a  code  rate  greater  than  the  value  given  few 
conventional  codes  by  the  Hamming  bound. 

In  order  to  evaluate  EMSC  code  performance  an  extension  of  the 
Hamming  bound  has  been  worked  out  as  a  function  of  the  error 
magnitude  subset  cardinality.  With  respect  to  this  bound  EMSC- 
perfect  codes  have  been  defined  and  the  existence  of  EMSC-perfect 
SE-EMSC  codes  has  been  proved. 

A  SE-EMSC  code  is  defined  by  building  its  parity  check  matrix 
H.  We  derive  the  matrix  H^of  a  («',  k')  SE-EMSC  code  over 
GF(q)  (q=p‘^,p  prime)  starting  from  the  matrix  H  of  an  (n,  k) 
Hamming  code  over  the  same  field  with  n<n'  and  (n-k)=(n'-k’)=m. 

Let  p. ,  1=1,  2,...,  a,  (u^-l)  be  the  field  elements  representing 
the  a  error  magnitudes  that  the  code  can  correct  and  Uj  ,j=l.  2,...,A , 
be  h  distinct  elements  of  GF{q)  such  that 

p.  Cy  ^P^a^  for  any  iVr  andyw  (1) 

For  any  set  {p.  )and  [ttj  )  satisfying  (1)  with  h^2,  a  SE- 
EMSC  code  exists  and  it  is  defined  by 

_  T.T 


The  above  defined  code  corrects  any  single-error  pattern  with 
error  magnitude  belonging  to  the  set  [P^  )  (if  (1)  holds,  the  n'a 
syndromes  of  the  above  error  patterns  are  distinct). 

As  a  simple  example  consider  the  case  <7=5,  a=2,  P-,=\,  P2=^- 
By  putting  h=2,  aj=l,  0^=2  in  (2),  a  SE-EMSC  code  can  be 
derived  from  the  (6,  4)  Hamming  code  over  GF(5)  whose  parity 
check  matrix  H  is 

hJ  0  1  1  1  1  1 1 
1  1  0  1  2  3  4  J  (3) 

Decoding  a  SE-EMSC  code  is  only  slightly  more  complex  than 
decoding  the  Hamming  code  used  for  its  construction.  As  (1)  holds, 
the  first  non-zero  element  of  the  syndrome  univocally  defines  the 
error  magninide  and  the  n-symbol  sub-block  (among  h  sub-blocks) 
that  contains  the  error.  The  enor  position  in  the  sub-block  is  found  in 
the  usual  way  by  dividing  the  syndrome  by  its  first  non-zero  element 
and  by  looking  for  the  coincidence  against  the  rows  of  H  . 

As  a  practical  example,  we  considered  the  application  of  the  SE- 
EMSC  codes  to  a  q-ary  memoryless  PAM  channel  with  negligible 
probability  that,  when  an  error  occurs,  the  received  amplitude  level  is 
not  adjacent  to  the  transmitted  one.  In  the  said  application,  for  c=l, 
some  of  these  codes  are  EMSC-perfect,  while  for  c>l  they  are 
equivalent  to  codes  with  c=l.  EMSC  codes  with  c>  1  result 
attractive  in  case  of  multiple  error  correction  or  multi-dimensional 
signal  sets. 

Work  carried  out  in  the  framework  of  the  agreement  between  the  Italian  FT 
Administration  and  the  Fondazione  Ugo  Bordoni. 


H'= 


“2 


(2) 


and  therefore  n'=hn  and  k'=n'-m. 


^  T 

an  h' 
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Introduction 

A  channel  with  localized  errors  is  characterized  by  the 
property  that  possible  error  positions  are  known  to  the 
encoder  but  not  the  decoder.  Equivalently  we  can  say 
that  the  encoder  knows  the  positions  that  will  be  error- 
free  after  transmission.  The  other  positions  are  unreli¬ 
able. 

We  construct  block  codes  for  binary  channels  with  lo¬ 
calized  errors.  Of  course  ordinary  error  correcting  codes 
can  always  be  used  on  a  channel  with  localized  errors  by 
simply  ignoring  the  additional  information  about  possi¬ 
ble  error  positions.  Some  of  our  codes,  however,  are  bet¬ 
ter  than  any  passible  ordinary  codes  of  the  same  lengths 
and  error  correction  capabilities. 

Result 

Theorem  Given  a  shortened  Hamming  code  of  length 
2”  —  1  —  t  and  a  Hamming  code  of  length  2”*  —  1,  where 
m  —  3,4, .. .  and  i  =  0, 1, . . ,  ,2"’  — 1,  then  a  single  error- 
correcting  code  for  localized  errors  of  length  2”'*^  —2  —  i 
and  size 


can  be  formed. 

Proof:(Coiistruction)  A  block  of  length  n  is  divided 
into  two  subblocks  of  lengths  hq  and  m,  where  no  = 
2™  -1  — »  and  ni  =  2”*  —  1.  Notice  that  n  =  2”*+*  -  2—i. 
Denote  by  tq  the  observed  number  of  possible  errors  in 
the  first  block. 

For  To  =  1  we  use  one  of  A  codewords  from  the  short¬ 
ened  Hamming  code  in  the  first  block  and  any  vec¬ 
tor  in  the  second  block.  For  tq  =  0  we  use  one  of 
23"-i-«  _  ^(2"*  —  i)  vectors  in  the  first  block  (outside 
the  decoding  spheres  of  the  A  vectors  used  in  the  pre¬ 
vious  case)  and  one  of  the  codewords  of  the  Hamming 
code  in  the  second  one.  The  size  of  the  resulting  code 
is  min  {A2*"*“‘,  (2*"“‘~'  -  A{2"'  -  i))  2*"“’"“*}.  We 
choose  A  so  that  these  two  quantities  are  as  close  as 


possible.  Since  A  must  be  an  integer  we  take  A  equal 
to  j  ■  It  is  easy  to  check  that  A  is  less  than  or 

equal  to  the  size  of  the  code  used  in  the  first  block  as 

long  as  t  =  0, 1 . 2*"  —  1.  With  this  choice  rf  A  the 

size  equals  A2’"“^.  □ 

Summary  and  Conclusion 

Denote  by  A(n,()  the  optimal  size  of  an  ordinary  er¬ 
ror  correcting  code  of  length  n  and  error  cwrection  ca¬ 
pability  t.  It  has  been  proved  [3]  that  i4(2™'*'*  —  3, 1) 
equals  and  that  A(2™'*''  —  4,1)  equals 

22<»+>_(m+l)-4 

It  is  easily  verified  (with  i  =  1, 2)  that  our  codes  outper¬ 
form  the  doubly  and  triply  shortened  Hamming  codes. 

The  construction  can  be  generalized  in  several  ways. 
First  we  can  find  other  single  error  correcting  codes  by 
using  others  than  Hamming  and  shortened  Hamming 
codes.  Second  the  ideas  can  be  used  for  constructing 
codes  correcting  more  than  one  error. 
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Abstract  -  Binary  block  codes  for  correcting  asymmetric  errors 
are  called  binary  AsEC  block  codes.  In  [1],  the  definitions  of  per¬ 
fect  and  weakly  perfect  binary  AsEC  block  codes  were  introduced, 
and  some  properties  of  such  codes  were  studied.  In  the  present 
fkaper,  we  generalize  these  concepts  and  results  to  a  larger  class 
of  AsEC  codes. 

Summary 

A  binary  asymmetric  error-correcting  code  (for  short,  AsEC  code) 
C  of  length  n  and  minimum  asymmetric  distance  A,  denoted  by 
Ca(n,A),  is  a  non-empty  proper  subset  of  {0,1}“  in  which  any 
two  distinct  vectors  are  at  asymmetric  distance  at  least  A  apart 
and  this  distance  is  realized  at  least  once.  With  the  asymmetric 
distance  metric,  the  notion  of  the  minimum  distance  r(c)  from  a 
certain  codeword  c  to  all  other  codewords  was  defined  in  [1].  r(c) 
presents  a  kind  of  measurement  of  error-correcting  capability  of 
the  codeword  c  which  is  better  than  that  in  terms  of  the  minimum 
asymmetric  distance  of  the  code.  Also  in  [1],  with  the  properties 
of  perfect  codes  for  the  binary  symmetric  channel  in  mind,  natural 
definitions  of  perfei^t  and  weakly  perfect  binary  AsEC  block  codes 
were  given,  which  is  related  to  the  distance  r(c).  Some  properties 
of  such  codes  were  derived  simultaneously  there. 

Since  the  packing  spheres  defined  for  asymmetric  cases  with  re¬ 
spect  to  the  asymmetric  distance  metric  extend  only  downwards, 
namely  only  towards  sm^dler  weights,  it  follows  that  for  a  binary 
AsEC  block  code  one  sometimes  could  increase  the  sizes  of  those 
packing  spheres  no  matter  how  they  are  with  radii  in  terms  of  the 
minimum  distance  of  the  code  or  r(c)’s,  such  that  all  these  im¬ 
proved  packing  spheres  still  remain  disjoint  mutually.  Therefore, 
in  the  sense  of  error-correcting  capabilities  of  codes,  other  param¬ 
eters  rather  than  the  minimum  distance  and  r(c)’s  would  be  able 
to  be  introduced  for  binary  AsEC  block  codes,  and  subsequently 
be  used  for  the  study  of  perfectness  of  such  codes. 

On  the  other  hand,  for  the  decoding  of  a  code,  one  should 
realize  that  a  received  word  y  only  comes  from  the  codewords 
covering  it.  The  strategy  of  a  maximum  likelihood  decoder  is  of 
course  to  decode  the  received  word  y  to  one  of  the  codewords  of 
lowest  weight  covering  y .  In  view  of  the  error-correcting  capability 
of  codes,  one  also  should  be  aware  of  the  two  following  facts.  First 
of  all,  if  c  is  the  codeword  of  a  Ca(n,A)  code  C  of  weight  less 
than  A,  then  the  error-correcting  capability  of  c  may  be  referred 
as  any  number  which  is  greater  than  w(c).  Hence  r(c)  does  not 
give  an  appropriate  measure  for  the  error-correcting  capability  of 
c.  Secondly,  sometimes  a  codeword  c  may  be  able  to  correct  more 
than  r(c)  —  1  errors.  Therefore,  for  the  error-correcting  capability 
of  codes,  other  parameters  would  be  able  to  be  introduced  instead 
of  the  minimum  distance  and  r(c)’s.  This  motivates  us  to  consider 
perfect  and  weakly  perfect  codes  capable  of  correcting  asymmetric 
errors  in  view  of  these  new  parameters,  which  leads  to  the  present 
paper. 


We  will  call  the  weakly  perfect  codes  defined  in  [1]  as  r-WP 
codes.  The  existence  of  r-WP  C.(n,A)  codes  was  exampled  in 
(Ij.  In  this  paper,  we  introduce  a  different  parameter  s(c)  instead 
of  r(c).  Generally,  s(c)  is  bigger  than  r(c).  By  using  the  same 
definition  for  r-WP  codes,  the  so  called  3-WP  codes  are  defined  in 
the  present  paper.  We  denote  by  A)  the  maximum  number 
of  codewords  in  a  s-WP  C.(f»,  A)  code,  i4,(n,A)  the  maximum 
number  of  codewords  in  a  C.(n,  A)  code  and  Wi(n,  A)  the  max¬ 
imum  number  of  codewords  in  a  r-WP  C,{n,  A)  code.  It  can  be 
readily  verified  that  any  r-WP  C.(n,A)  code  is  a  a-WP  code. 
Thus,  Wa(n,  A)  <  .Ya(n,  A)  <  /la(n.  A).  On  the  other  hand,  one 
can  find  examples  of  existence  of  s-WP  codes  which  are  not  r- 
WP  codes.  Hence  the  class  of  s-WP  codes  is  larger  than  that 
of  r-WP  codes.  So,  any  property  derived  for  s-WP  codes  can 
be  certainly  applied  to  r-WP  codes  as  well.  For  s-WP  C,(n,A) 
codes,  the  following  main  results  have  been  obtained  in  this  pa- 
per:  (1)  A  (7a(n,A)  code  C  is  s-perfect  if  and  only  if  C  is  the 
repetition  code.  (2)  If  n  >  2A,  then  X,(n,  A)  <  A.(n,  A),  which 
also  implies  that  if  n  >  2A,  then  any  nontrivial  s-WP  G,(n,  A) 
code  cannot  contain  a  codeword  of  weight  greater  than  n  —  A. 
Therefore,  a  s-WP  C«(n,A)  code  with  n  >  2A  can  always  be 
enlarged  with  the  all-one  vector  1  to  a  bigger  C,(n,  A)  code. 
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1  Introduction 

Error  control  codes  inch  as  single-bit  error  correcting  and  donUe-bit  er¬ 
ror  detecting  (SEC-DED)  codes  are  popniariy  need  in  compnter  U^-^eed 
memories  I'l.  Tbis  p^er  proposes  a  new  type  ct  error  control  code  ^  wkidi 
indicates  cnly  a  location  d  nnidirectional  errors  dnstered  in  t  bits  (i  >  i) 
kngtb,  called  iyte,  bnt  does  not  indicate  accnrate  cnor  bit  posHioos  in 
the  byte.  This  is  considered  to  be  cost-effective  dne  to  its  low  rednndaney. 
Also,  this  is  very  nsefol  especially  for  diagnostic  purposes  in  compnter  sys¬ 
tems  which  give  the  information  to  exdmnge  fonlty  packages  or  fonlty  chips. 
Code  constmction  method  of  the  single  i-bit  byte  nnidirectional  crm  lo¬ 
cating  code,  called  ShUEL  code,  and  its  lower  bonnd  on  redundancy  are 
demmistrated  in  this  pi^ter. 

2  Code  Construction 

Let  JC  =  . X.)  and  y  =  (Yi.Yt . Y,),  mhtK  Xi.Yt  £ 

{0, 1}*,  i  =  1, 2 . .  be  two  distinct  binary  codewords  each  having  length 

of  n  bytes,  indnded  in  code  C,  i.e.,  X,Y  £  C.  In  this  case,  every  byte  of 

X,  =  . Xi^)  andy;-  =  (y,,i,yi,2,...,K,t)has  1-bit  length,  where 

*i  y‘J  S  !}•  •  =  1. 2 . .  and  i  =  1, 2, . . . ,  1. 

Definition  1  1*1:  The  function  N  is  defined  as 

N(Xi,  ro  = ! '  j  I  Xij  =  1 A  K  J-  =  0}| , 

where  |A|  denotes  the  number  of  elements  in  the  set  A.  Then  the  tmidinc- 
tianal  hfU  distance  V  is  defined  as 

V(X.Y)  =  '£V(X<.Yi). 

iMt 

where 

(  2  •/  N(Xi,y<)5S0  A  U(Yi,Xt)jtO. 

-P(Xi,  X)  =  ^  1  1/  X.  ?<  X  A  ( N(X.  X)  =  0  V  N(X,  x<)  =  0). 

l0i7Xi  =  X.  □ 

Definition  2  1^1:  Unardertd  iyle  numler  between  X  and  Y  is  defined  as 

f(x,y)  =  |{i|D(X(,x)  =  2}|. 


Theorem  1  :  Cede  C  it  an  SkVEL  code  iff  any  werdt  X  and  Y  included  in 
C  taliiff  the  Jellemn§  relatien: 

V(X,Y)>3,  or  f(X,y)>l.  ^ 

Code  construction  algorithm  : 

(1)  Let  the  input  word  having  k  bytes  be  D  =  (Dt,D],...,Ds),  where 
Dj,  i  =  1, 2, . . . ,  k,  represents  the  information  byte  with  fixed  length  of 
1  bits. 


(2)  Let  tv(D)  be  a  concatenation  of  weight  of  each  informatimi  byte,  that 

is, 


to(D)  =  {u»(Di),w(Dj) . w(0*)}. 


where  u/(Di)  represents  the  weight  of  the  information  byte  Di  and  has 
value  ranging  from  xero  to  1.  Therefore,  bit-length  of  w(Dj),  shown  as 
Iv,  is  equal  to  nog2(l  +  1)1,  where  \0\  represents  the  snudlest  integer 
greater  than  or  equal  to  d- 


(3)  The  maximal  linear  code  of  the  sinf^e  l.-bit  byte  error  correcting  code 
(SiaEC  code)l*l  is  ^>plied  to  encode  the  above  defined  w(D).  That  is, 
multiplying  w(D)  by  the  encoding  parity  check  matrix  of  the  maximal 
code  of  the  Si.EC  code,  Hp,  i.e.,  u>{D)  •  Hj,  generates  the  check 

bytes,  CBi,CB],...,CB,>i,  whereCfij.i  =  1,2 . r,  is  a  check  byte 

having  length  of  and  CS,^.t  is  the  last  check  byte  having  length  g, 
0  S  f  <  Ia  Ike  maximal  codes  is  shown  as 


’ 

0i.-  0,. 

0».  -  -  0,. 

... 

0,.-. 

•  0». 

Us.  •  •  •  Ot, 

Ui.-- 

■  «t. 

H,= 

H<3S,+,),S. 

I*. 

I*. 

I*. 

= 

1  1 
a«  ...  a*--‘ 

... 

1  1 
a*'  ...  a***--* 

1 - r~ 

a*”---*  ...  a‘-* 

.  1  1 

1  1 

1  1  . 

where  R  =  r-h^+g,  It.  isal.  xi«  identity  mateix,  0|.  isata  xi^  aero 


matrix  and  erqtresses  a  coefficient  cohmm  vertor  of  x*  mod  g(x), 

where  p(x)  is  a  jnimitive  polynomial  with  degree  of  {R  —  1.). 

(4)  By  appending  the  check  bytes  to  the  cwiginal  input  weed  D,  the  code- 
woid  of  C  yields  to 

[D|  CBi,CBt,...CB„CBf+i].  □ 

Theorem  2  :  The  set  ej  eedemerdt  ehtained  from  the  sieve  steps  (1)  te  (4) 
it  an  51 VEL  code.  □ 

3  Evaluation 

Theorem  3  :  Let  t  le  the  numler  of  informatien  hgtet  with  1  lits/lyte. 
Then,  ang  cede  that  leeatee  tingle  hgte  anidirectienal  errert  needs  at  least 
ffopi(k  - 1  -t- 1)1  clecl  life.  □ 

Figure  1  shows  an  example  of  the  relation  between  the  check  bit-length 
and  the  infonnation  bit-length  of  the  SHUT,  codes  when  1  =  4  bits.  In 
this  figure,  the  dotted  line  shows  the  lower  bonnd  on  the  check  bit  length 
mentiemed  in  the  Theorem  3.  The  broken  fine  shows  the  case  of  a  code 
proposed  by  Dunning  ct  al^)  whidi  is  originally  a  double  byte  unidirectional 
error  detecting  code  for  the  set  of  wei^t  symbols  of  the  input  word  over 
GF{p),  where  p  is  a  prime  larger  than  1,  and  therefore  can  be  regarded  as 
an  SlUEL  code. 


Figure  I;  Gwck-bii  length  vj.  infonnatran-bic  length  of  the  S4UEL  Code 

4  Conclusion 

This  paper  has  proposed  the  construction  method  of  a  new  type  of  uni- 
directicmal  errew  control  code  whidi  indicates  the  location  of  single  byte 
nnidirectional  errors  in  the  received  word.  It  has  clarified  the  necessary  and 
sufficient  conditicxu  for  this  type  of  code,  and  the  lower  bound  on  the  check 
bit  length. 

If  the  linear  code  having  the  minimum  Hamming  distance  djr  over 
GF(2**)  is  applied  to  the  proposed  code  construction  method,  we  can  get, 
in  general,  the  bytes  unidirectional  error  locating  codes  for  an  odd 
number  da,  and  the  bytes  unidirectional  error  locating  and  ^  bytes 
unidirectional  error  detecting  codes  for  an  even  number  da- 
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Abstract  This  paper  presents  three  new  maximum  likelihood  decod¬ 
ing  (MLD)  algorithms  for  linear  codes  over  Z-channel,  which  are  much 
more  efficient  than  conventional  exhaustive  algorithms.  In  the  proposed 
algorithms,  their  complexities  are  reduced  by  employing  the  projecting  set 
Cj  of  the  codewords,  which  is  determined  by  the  ^'projecting*'  structure  of 
the  code.  Namely,  the  complexities  of  algorithms  mainly  depend  upon  the 
size  of  C,  which  is  several  times  smaller  than  the  total  number  of  code¬ 
words.  It  is  shown  that  the  complexities  of  three  decoding  algorithms  are 
in  proportion  to  the  number  of  zeros  in  the  received  word,  Hamming  weight 
of  the  received  word,  and  the  number  of  parity  bits,  respectively. 

1.  Introduction 

In  the  optical  communication  system  or  semiconductor  memory,  it  is 
known  that  the  communication  channel  can  be  usually  modeled  by  Z- 
channel.  In  Z-channel,  symbol  ‘O'  does  not  change  to  symbol  '1',  though 
'V  changes  to  ‘O'  with  probability  e.  To  make  the  most  of  the  error  cor¬ 
recting  ability  of  codes,  a  maximum  likelihood  decoding  (MLD)  algorithm 
for  cyclic  codes  over  Z-channel  has  been  reported[l|.  However,  in  order 
to  improve  the  performance  of  communication  employing  error  correcting 
codes,  it  is  important  to  develop  efficient  MLD  algorithms  for  much  wider 
class  of  codes  such  as  linear  codes.  This  paper  presents  three  new  MLD 
algorithms  for  linear  codes  over  Z-channel,  which  are  much  more  efficient 
than  conventional  exhaustive  algorithms,  and  clarifies  the  complexities  of 
these  algorithms. 

2.  Prciiminaries 

Consider  a  linear  binary  (n,  k,  d)  systematic  code  C,  where  n,  k  and  d  are 
the  code  length,  the  number  of  information  bits  and  the  minimum  Hamming 
distance  of  C,  respectively.  Assume  that  all  codewords  are  equally  likely, 
and  an  MLD  algorithm  is  defined  as  follows: 

[Definition  1]  (MLD  algorithm)  For  a  received  word  r,  an  MLD  al¬ 
gorithm  chooses  a  codeword  c  to  maximize  the  the  conditional  probability 
Pr(rlc)  that  a  word  r  is  received  when  a  codeword  c  is  sent. 

Here,  we  introduce  the  concept  of  “projecting”  and  “projecting  set"  as 
follows[2]: 

[Definition  2]  (Projecting)  If  nonzero  codewords  Ci  and  cj  satisfies 
Cl  Acj  =  C2,  Cl  is  projected  by  Cj,  and  we  denote  Cj  <  Ci,  where  A  denotes 
a  bit-by-bit  and  operator. 

[Definition  3]  (Projecting  set)  The  projecting  set  of  a  linear  code  C, 
denoted  by  C«,  is  the  smallest  subset  of  nonzero  codewords  of  C  such  that 
for  any  nonzero  codeword  c  €  C  and  c  ^  C«,  there  exists  a  c«  €  which 
projects  into  c. 

It  is  reported  that  the  actual  number  of  codewords  in  C,  is  at  most 
several  times  smaller  than  the  total  number  of  codewords[2|. 

3.  MLD  Algorithms  for  Z-channel 

By  employing  the  projecting  set  Cj,  we  propose  the  following  efficient 
MLD  algorithms  for  Z-channel.  In  these  algorithms,  let  u;(z;)  denote  Ham¬ 
ming  weight,  let  Wkix)  denote  Hamming  weight  of  the  first  k  bits  of  and 
let  ^  denote  the  modulo-2  addition. 

[  MLD  Algorithm  I  ]  For  the  received  word  r: 

1.  Find  a  codeword  Ci  €  C  such  that  r  <  Ci'. 

2.  If  u»(C|  ‘h  Cg)  >  w(ci )  for  all  c,  €  C,  satisfying  Cj  A  r  =  o,  go  to  4. 


3.  Find  c.^,,  which  minimizes  ii;(cj  0  c«)  among  Cj  €  C,  satisfying 
c,  A  r  =  o.  Then  Ci  < —  Ci  0  and  go  to  2. 

4.  Cl  is  the  desired  codeword.  □ 

[  MLD  Algorithm  U  ]  For  the  recrived  word  r  =  (ri,r2,*  **  .Vn): 

1.  Let  i  =  1,  Cl  =  o  and  Vj  =  (ri.rj,  -  ••  ,rj,0, -  -  ■  ,0)  {j  —  1,2,  •  ,n) 

2.  If  Ti  ^  Cl,  find  c«^.,  which  minimizes  u;(ci  0  c«)  among  c«  € 
satisfying  <  (ci  0  c,).  Then  Ci  Cj  0  c,^,^. 

3.  If  i  <  n  then  i  < —  i  1  and  go  to  2. 

4.  Cl  is  the  desired  codeword.  □ 

[  MLD  Algorithm  III  ]  For  the  received  word  r  =  (ri , rj,  •  •  • ,  r„): 

1.  Let  i  =  1  and  tj  =  (ri,r2,- •  •  ,r*+j,0, •  ••  ,0)  (j  =  1,2, - ,n  —  fc).  Let 
C}  be  a  codeword  specified  by 

Cl  =  (ci.i,  C], ,,•••,  Cl, „)  =  (ri.rj, 

where  G  is  the  k  x  n  generator  matrix  of  the  code  C- 

2.  If  Ci^h+i  =  1,  find  c.,,,,  which  minimizes  $c.)  among 

c,  e  C,  U  {o}  satisfying  rj  <  (ci  ®  c,).  Then  Ci  Ci  ®  c,„„. 

3.  If  i  <  n  -  *  then  » . —  »  +  1  and  go  to  2. 

4.  C]  is  the  desired  codeword.  □ 

In  these  algorithms,  their  complexities  are  mainly  depend  upon  the  sixe 
of  Cj.  Since  the  size  of  projecting  set  1(7,1  is  several  times  smaller  than  the 
total  number  of  codewords  (2*),  both  computational  and  space  complexities 
for  these  algorithms  are  significantly  reduced  compared  with  conventional 
exhaustive  algorithms.  The  upper-bound  of  the  complexities  of  the  algo¬ 
rithms  are  shown  in  the  Table  1.  It  should  be  noted  that  the  combination 
of  Algorithm  I  and  II  yields  an  MLD  algorithm  with  maximum  number  of 
comparisons  min{u>(r),n  —  w(r)}|C,|. 

4.  Conclusion 

This  paper  proposes  some  efficient  MLD  algorithms  for  linear  codes  over 
Z-channel,  and  clarifies  their  complexities. 
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Table  1.  Complexities  of  the  proposed  algorithms 


Algorithm 

Number  of 

codewords  stored 

Maximum 

number  of  comparisons 

I 

|C.| 

(n  -  w(r))|C,| 

n 

IC.I 

w(i-)|C.l 

Ill 

|C.|  -h  * 

(n-*)|C.| 

Exhaustive 

2*  or  t 

2* 

‘  If  ( 1, 1.  ,  1 )  i.  «  codeword  of  C.  thi.  codeword  elwey.  lati.fie.  the  conditioD. 
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Abstract:  The  problem  of  coherent  multiuser  detection  is  consid¬ 
ered  for  the  /i-user  asynchronous  Gaussian  Code-Division  Multiple- 
Access  (CDMA)  channel.  The  maximum  likelihood  sequence  detector 
(MLSD)  is  asymptotically  optimal  in  that  it  achieves  the  highest  er¬ 
ror  exponent  of  the  bit  error  probability  for  each  user.  However,  the 
MLSD  can  only  be  implemented  by  a  dynamic  programming  algorithm 
whose  complexity  depends  exponentially  on  K.  In  order  to  mitigate 
the  complexity  of  this  scheme,  a  class  of  group  detection  strategies 
is  derived  based  on  optimal  statistical  inferential  procedures.  Each 
member  of  this  class  of  detectors  corresponds  to  a  L  group  partition 
of  the  K  users,  and  consists  of  a  bank  of  L  group  detectors,  one  for 
demodulating  the  information  symbols  of  users  in  each  group.  Each 
group  detector  is  a  reduced  state  sequence  detector  with  the  dominant 
complexity  determined  by  the  computation  of  the  solution  of  a  com¬ 
binatorial  optimization  problem  via  a  forward  dynamic  programming 
algorithm.  This  algorithm  has  a  complexity  that  is  exponential  in  the 
number  of  users  in  the  corresponding  group.  The  overall  comple.xity  is 
determined  by  the  size  of  the  largest  group  which  is  a  design  paraune- 
ter  that  can  be  chosen  to  be  only  as  large  as  complexity  considerations 
allow.  The  performance  analysis  of  the  group  detection  scheme  is  ob¬ 
tained  by  deriving  asymptotically  tight  upper  and  lower  bounds  on  the 
bit  error  probability,  thereby  characterizing  its  multiuser  asymptotic 
efficiency. 

Summary 

The  idea  of  group  detection  was  introduced  by  the  author  in  [2)  in 
the  context  of  multiuser  detection  over  a  QAM  synchronous  Gaussian 
CDMA  channel.  The  synchronous  channel  is  memorylcss  and  there¬ 
fore  single-shot  decisions  can  be  optimal.  However,  the  more  general 
asynchronous  problem  is  inherently  a  sequence  detection  problem  and 
new  problems  arise  in  generalizing  the  results  in  (2). 

L 

It  was  shown  in  [2]  that  fui  an  arbitrary  L  group  partition  U  G/  = 

1=1 

{1, . . . ,  A'}  of  the  set  of  K  active  users  in  a  Af-ary  QAM  synchronous 
CDMA  channel,  a  generalized  likelihood  ratio  test-based  group  de¬ 
tection  scheme  can  be  implemented  in  parallel  <is  a  bank  of  L  group 
detectors,  one  for  each  group  in  the  partition.  The  /"*  group  detector 
jointly  demodulates  the  users  in  the  group  G;  and  the  time  complex¬ 
ity  per  symbol  (TCS)  of  the  group  detector  is  l\Gi\)  for 

Af-ary  QAM  alphabets.  From  complexity  considerations  alone,  the 
K 

trivial  single-user  partition  (J  {0  ~  {1, ....  A  }  is  the  most  desirable 
1=1 

and  yields  the  decorrolating  detection  scheme  with  a  complexity  that 
is  independent  of  A'.  Performance  considerations,  however,  tell  a  dif¬ 
ferent  story.  A  key  result  in  [2]  establishes  that  a  group  G  detector  is 
optimally  group  near-far  resistant  in  the  sense  that,  for  each  user  in 
G,  it  achieves  the  highest  achievable  worst-case  asymptotic  efficency 
over  the  signal  amplitudes  of  users  not  in  G.  As  a  consequence,  viewed 
from  the  performance  viewpoint  alone,  membership  of  a  given  user  in 
a  larger  group  is  preferred  over  that  in  a  smaller  group  contained  by 
it.  A  larger  group  size,  however,  brings  with  it  a  higher  complexity. 

A  vector  space  interpretation  of  the  group  detector  for  the  syn¬ 
chronous  channel  involves  the  direct  sum  decomposition  of  the  space 
spanned  by  the  signature  signals  of  all  the  users  into  two  subspaces, 
one  of  which  is  spanned  by  the  signals  of  the  users  in  the  group  to  be 
demodulated  and  the  other  by  the  rest  of  the  signals.  The  complexity 
of  tlic  group  detector  is  due  to  the  computation  of  orthogonal  projec¬ 
tions  of  certain  transformations  of  the  outputs  of  a  bank  of  matched 


filters  (matched  to  the  orthonormal  bases  of  the  K  dimensional  signal 
space)  onto  the  perp  space  of  the  subspace  spanned  by  the  users  not 
in  the  group  G/.  In  generalizing  this  approach  to  the  asynchronous 
channel,  the  two  subspaces  in  the  direct  sum  decomposition  gener¬ 
alize  to  those  spanned  by  all  time-shifted  (  by  integer  multiples  of 
symbol  durations)  versions  of  signature  signals  of  users  belonging  to 
the  group  under  consideration  and  those  that  do  not  bdong  to  this 
group.  The  number  of  orthogonal  projections  that  need  to  be  com¬ 
puted  in  this  case  is  (A'  is  the  packet  length!)  and  it  can  be 

shown  that  there  is  no  srJution  with  a  complexity  that  is  indepen¬ 
dent  of  the  packet  length.  A  key  result  of  this  paper  is  the  derivation 
of  an  alternative  oblique  projections-based  group  detection  strategy 
where  the  A/^l®'l  oblique  projections  can  be  computed  by  a  forward 
dymanic  programming  algorithm  whose  complexity  is  independent  of 
the  packet  length  and  depends  exponentially  only  on  the  number  of 
users  in  the  group  that  it  demodulates.  Since  the  group  size  is  a  de¬ 
sign  parameter,  it  can  be  chosen  to  be  only  as  large  as  complexity 
considerations  allow. 

It  was  seen  in  [2]  that  the  performance  analysis  of  the  group  de¬ 
tection  scheme  for  the  synchronous  channel  could  be  deduced  from  a 
result  on  the  equivalence  of  a  group-G  detector  with  a  ma.ximum  like¬ 
lihood  detector  in  a  fictitious  lG|-user  synchronous  Gaussian  CDMA 
channel.  However,  this  equivalence  doesn’t  hold  for  the  oblique 
projections-based  group  detector  for  the  asynchronous  cliannel.  In 
fact,  it  is  shown  that  the  orthogonal  projections-based  group  detector 
of  [Var92]  when  generalized  to  the  asynchronous  channel,  though  not 
practically  implementable,  has  an  asymptotic  efficiency  performance 
that  is  an  upper  bound  on  the  performance  of  the  oblique  projections- 
based  group  detection  scheme.  The  second  key  result  of  tliis  paper  is 
the  derivation  of  asymptotically  tight  upper  and  lower  bounds  on  the 
bit  error  probability  of  the  proposed  group  detection  scheme  for  the 
asynchronous  channel  thereby  characterizing  its  asymptotic  efficiency. 

The  design  and  analysis  of  the  reduced  state  group  detection  scheme 
obtained  in  this  work  provides  a  unifying  treatment  of  the  multiuser 
detection  problem  in  the  sense  that  two  detectors  corresponding  to 
two  trivial  partitions  result  in  previously  proposed  schemes.  The  ca.se 
of  a  partition  of  users  into  one  large  group  of  size  A'  yields  the  MLSD 
obtainetl  in  [3]  with  the  highest  possible  asymptotic  efficiency  for  each 
user,  but  at  the  price  of  an  exponential  complexity  in  A'.  The  other 
extreme  case  of  a  partition  that  consists  of  A'  groups,  each  of  size  one, 
results  in  a  group  detection  scheme  wliich  reduces  to  the  decorrolating 
detector  [1].  This  detector  requires  only  a  A'-input  A'-output  digital 
filter  following  the  bank  of  matched  filters,  .All  other  partitions  yield 
new  detection  schemes  and  the  interplay  between  the  complexity  and 
the  performance  of  these  sclicmes  will  be  presented. 
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Abstract 

Performance  analyses  of  fiber-optic  code  division  multiple  access 
(FO-CDMA)  systems  are  intractable  and  often,  Monte  Carlo  sim¬ 
ulations  that  yield  realistic  estimates  of  system  performance  re¬ 
quire  a  large  number  of  simulation  trials  for  the  estimates  to  be 
in  a  reasonable  interval  of  confidence.  We  develop  an  Importance 
Sampling  technique  to  estimate  the  performance  of  direct  detec¬ 
tion  FO-CDMA  systems,  where  the  “gain”  of  Importance  Sampling 
over  Monte  Carlo  simulations  is  shown  to  increase  linearly  with  the 
system  performance.  The  quick  simulation  technique  developed  ex¬ 
tends  to  avalanche  photodetection  and  is  also  compatible  with  a 
wide  variety  of  coding  schemes.  Using  these  efficient  simulations, 
we  present  a  comparitive  analysis  of  systems  employing  optical- 
orthogonal-codes  and  prime-sequences,  where  only  50-100  trials  are 
required  for  estimating  error  probabilities  of  10  and  below.  Based 
on  an  inexact  Fourier  expansion  of  the  Poisson  complimentary  prob- 
ablity  distribution  function,  we  derive  approximations  for  the  prob¬ 
ability  of  error  that  require  computations  increasing  linearly  with 
the  number  of  users  as  opposed  to  an  exponential  increase  in  the 
case  of  exact  evaluation  of  the  error  probability.  The  inaccuracy  of 
the  results  are  shown  to  be  bounded. 

System  Description  :  A  FO-CDMA  system  is  considered  where 
the  information  bit  of  each  user  is  modulated  onto  the  intensity  of 
the  laser  transmitted  through  a  single-mode  fiber  channel.  If  user 
k  is  sending  bit  t,  under  hypothesis  Hi,  then  the  intensity  of  the 
modulated  light  is  given  as 
N 

a!*’(0  =  L  Al*>(n)nr,(«  -  nT,),  i  =  0, 1;  for  f  €  (0,r)  (1) 

nssl 

and  n7’c(f)  is  a  unit  rectangular  pulse  of  duration  Tc,  and  = 
(A|*'(l),---,A|**(fV))  is  a  signature  se<iuence  of  length  N  =  T/Tc 
with  each  A5*'(n)  €  {0,1}.  At  the  receiving  end,  this  gives  rise 
to  the  following  two  hypotheses  at  receiver  of  the  desired  user 
(taken  to  be  user  1)  in  the  time  interval  [0,T)  as  Hi  :  Ap^(t)  = 
where  the  symbol  b  denotes  the  information 
of  the  k"'  user.  The  receiver  corresponding  to  user  1  has  a  replica 
of  the  signature  sequence  assigned  to  this  user,  and  the  light  in  the 
channel  is  correlated  with  this  replicated  signature  sequence.  The 
correlated  intensities  are  incident  on  an  ideal  photodiode  and  the 
resulting  photoelectron  count  is  compared  to  a  threshold  for  data 
recovery.  Without  loss  of  generality,  we  assume  that  each  user  is  em¬ 
ploying  on-off  keying,  and  hence  at  the  1*‘  receiver,  the  intensities 
are  correlated  with  A*,'',  since  Aj,’*  =  For  an  {N,J,pa,Pc}  optical 
code  sequence  (i.e.,  a  sequence  of  length  A,  weight  J,  and,  auto  and 
crosscorrelation  constraints  />„  and  pc  respectively)  the  probability 
of  r  photoelectrons  occurring,  under  Hi,  in  the  sampling  interval  T 
is  given  as 

MA-i) 

PR\H,('r)=  5^  «  =  0.1.  (2) 

*=0 

where  Pr|c,,h,(’"  1  fi)  is  the  conditional  photoelectron  probability 
and  Pj(i){k)  is  the  probability  that  the  multiple  access  interference 
term  51*^3 ft'*)  takes  on  the  value  k.  The  probability  of 

error  can  be  computed  for  equiprobable  hypothesis  as 

^  1=0  Ai-, 


where  A,-  is  the  decision  region  for  Hi,  t  =  0, 1. 

Importance  Sampling  :  We  obtain  the  Importance  Sampling  en- 
timator  by  rewriting  the  error  rate  in  (3)  as 

Pe  =  ht.  E  P«-|H.("  I  I  (4) 

^  tsOAi-s 

where  io(r  |  Hi)  =  are  the  weights  under  Hi-  The  “gain" 

of  Importance  Sampling  over  Monte  Carlo  simulations  is  given  by 
r  =  where  Mjnc  and  Mis  are  the  number  of  trials  under 

the  respKtive  methods.  The  sufficient  conditions  for  achieving  a 
realistic  “gain”  reduce  to  PB.|//.(r  (  Hi)  >  Pii|/r,(r  |  Hi),  Vr  €  Ai_,. 
Since  the  optimum  solution  to  maximizing  F  yields  a  degenerate 
biasing  density  [1],  we  look  for  a  suboptimal  solution  satisfying  the 
sufiScient  conditions  as  given  above.  To  make  the  problem  of  de¬ 
termining  the  suboptimal  biasing  density  tractable,  we  choose  not 
to  bias  the  multiple  access  interference  parameters  [1]  and  look  for 
biasing  densities  of  the  form 

P4K-1) 

PR‘\H,(r)=  53  FK»(<^)P«'|i!..«i(’‘l*'i).  «  =  0,1-  (5) 

*=o 

Further,  we  show  that  when  an  exponential  change  of  measure  yields 
the  biasing  density  (i.e.,  Pr*(t)  =  e*”  j^^),  the  sufficient  conditions 
reduce  to  solving  the  following  minmax  problem  [1]: 

n>in  max  {(-(jj  +  ViN  -|-  *r)  +  i/,*  -I- 1)  -|-  r  *)}' 

where  k  €  [0,pc(A'  - 1))  and  pr.)h,  is  parametrized  by  the  parameter 
«'*• 

Approximations  :  In  the  evaluation  of  the  analytical  probabil¬ 
ity  of  error  in  equation  (3),  we  need  to  compute  the  expectations 
in  (2)  over  all  the  information  paths  of  the  process.  In  general, 
the  distributions  of  the  sum  of  the  interfering  intensities  do  not 
have  closed  form  expressions,  and  since  each  €  {0, 1,  ■  ■  sPcIi 
roughly  (pc  -I-  1)^~'  computations  are  required.  If  we  decom¬ 
pose  the  photoelectron  count  at  the  output  of  the  photodetector 
as  R  =  R,  -b  Rj,  (where  R,  and  Rj  are  contributions  due  to  the 
user  intensity  and  the  interference  intensities,  respectively)  then  we 
can  write  PR\H„(r  |  Ho)  =  Zrj  PBi('"z)<?«.(7  -  '‘r),  where  Qr. 
is  the  complimentary  cumulative  distribution  function  of  the  Poisson 
random  variable  with  mean  Vi,  7  the  detector  threshold,  and  pr^ 
is  the  probability  mass  function  of  the  photoelectron  count  due  to 
interfering  intensities.  By  representing  Qr,  in  terms  of  an  approx¬ 
imate  Fourier  series  as  Qr,{x)  =  53m=-oo  Cm*'’"*"*  +  <(*)>  where 
is  the  angular  frequency  term  and  c(z)  an  error  term,  we  can  write 
Ea,  PR\Hoi’‘  I  Po)  =  Em=-oo  ♦R.(-niu)))-bd,  where 

Rj  has  been  decomposed  into  the  sum  of  A’  - 1  independent  random 
variables,  i.e,,  Rj  =  ^  —  Eri  PBi<(7  -  The  mo¬ 

ment  generating  functions  4r^  can  be  evaluated  from  knowing  the 
probabilities  R^i).  If  we  can  truncate  the  above  series  to  M  terms, 

we  see  that  the  computations  required  are  equal  to  pc(H  —  1)M. 
Thus  we  have  reduced  the  number  of  computations  to  be  linear  in 
K  as  opposed  to  being  exponential  in  K. 
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ABSTRACT 

Performance  results  for  optimum  demodulation  of  optical  code 
division  multiple  access  (CDMA)  signals  are  obtained.  Upper  and 
lower  bounds  on  minimum  probablity  of  symbol  error  for  a  Jf-user, 
symbol-asyncbronous  optical  CDMA  system  with  optical  ortbogonal 
codes  (OOCs)  are  derived  and  evaluated.  An  asymptotic  efficiency  de¬ 
fined  relative  to  the  performance  in  known  interference  is  introduced. 
The  results  obtained  exhibit  the  asymptotic  efficiency  of  optimum 
demodulation  and  suggest  the  existence  of  a  significant  performance 
gap  between  the  optimum  receiver  and  the  conventional  correlation 
receiver  even  in  mild  near-far  environments. 


Overview 

In  this  paper,  we  consider  the  performance  analysis  of  optical  CDMA 
communications.  Such  formats  are  of  interest  in  several  emerging  ap¬ 
plications,  including  indoor  wireless  communications  and  all-optical 
processor  interconnects.  Because  of  its  low  complexity,  the  conven¬ 
tional  correlation  receiver  has  been  the  focus  of  much  work.  It  is  well 
known  that  conventional  receiver  performance  suffers  when  the  signals 
of  different  users  are  received  with  unequal  energies.  In  this  situation, 
the  performance  of  optimum  demodulation  is  of  special  interest. 

In  direct  sequence  optical  CDMA,  each  user  k  is  assigned  signa^ 
ture  sequences  €  {0,  !}•',  which  effectively  divide  the  symbol 

interval  into  J  "chips*.  User  k  signals  a  symbol  ba  hy  transmitting  an 
optical  pulse  in  each  of  the  chip  intervals  corresponding  to  a  ”1"  in  the 
signature  sequence  The  K  signals  are  combined  non-coherently 
on  the  channel,  which  may  be  fitee-space  or  guided. 


Demodulation  is  based  on  knowledge  of  the  transmitter  delays,  en¬ 
ergies,  and  signature  sequences  and  on  direct  detection  of  the  received 
signal  over  each  chip  interval.  The  observations  may  be  modelled  as 
conditionally  Poisson  randoin..variables  with  ra|M|gWen  by 

Tfc  "f"  Sfc:  J+j-Tk  "I” 

for  the  observation  over  the  jtS  chip  or  the  1th  symbol  of  the  desired 
user,  user  1.  The  integer  ti,  represents  a  chip-synchronous  transmit¬ 
ter  delay  relative  to  ri,  Aa  corresponds  to  energy  of  user  k,  and  Aj 
represents  photodetection  dark  current. 


The  error  probability  analysis  of  this  paper  avoids  the  unreal¬ 
istic  assumptions  of  symbol-synchronous  transmission  and  random 
codes  which  previous  analyses  have  required  [Ij.  The  performance 
measures  considered  include  the  single-user  lower  bound,  the  known- 
interference  lower  bound  (achieved  by  the  likelihood-ratio  test  when  ia 
for  k  ^  I  arc  known),  the  CheinolT  uppT  bouua,  an'l  lue  nerforir.ar.ee 
of  a  modified  conventional  detector  [2],  which  ignores  observations 
from  those  chips  during  which  interfering  users  are  transmitting.  The 
correlation  receiver  performance  in  known  interference  is  also  evalu¬ 
ated.  The  use  of  OOCs  [3]  with  maximum  cross- correlations  equal  to 
1  allows  the  optimum  decision  for  bi(l)  to  be  made  symbol-by-symbol. 
We  also  employ  saddle- point  approximations  [4],  which  expedite  nu¬ 
merical  analysis  and  provide  exceptionally  good  approximations. 


Numerical  results  are  presented  in  Figure  1  for  a  4- user  system 
in  which  A]  =  Aj  =  Aj.  Curves  are  plotted  versus  the  near-far  ratio 
(NFR),  defined  here  as  A4/A1.  The  OOCs  utilized  have  weight  equal 
to  4  and  length  equal  to  73.  Even  when  all  users  have  equal  energies 
(NFR  =  0  dB),  optimum  performance  is  still  more  than  two  orders  of 
magnitude  better  than  conventional  receiver  performance. 

In  [2]  an  aeymptotic  multiuser  efficiency  for  the  optical  CDMA 
channel  is  defined  relative  to  single-user  performance.  Here  we  define 
an  efficiency  relative  to  performance  in  known-interference.  (Unlike 
the  analagous  situation  for  radio-frequency  channels,  known  interfer- 
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ence  is  not  equivalent  to  no  interference.)  Let  Pcft(Ai)  denote,  the  op¬ 
timum  error  probability  for  user  1  and  i)k>(Ai)  the  known-interference 
lower  bound.  For  a  given  error  probability  P.  =  Pap<(At),  let  A|  repre¬ 
sent  the  energy  required  for  user  1  to  achieve  P,  in  known  interference 
(Pht(Ai)  =  Pf).  Then  the  asymptotic  relative  efficiency  is  given  by 

A  .-  S,  ,.  (»i) 

17  =  hmz,_>o.  ^  =  lunA,_>«  -*•— 

where  the  enerpes  of  the  interferers  are  held  proportional  to  the  en¬ 
ergy  of  user  1,  i.e.  A*  =  t*Ai,  Vfc  >  2. 


In  Figure  2,  the  asymptotic  efficiency  of  the  optimum  receiver  for 
a  2-ttser  synchronous  system  with  arbitrary  {0, 1}  signature  sequences 
and  Aj  =  0  ii  plotted  versus  (j,  the  near-far  ratio,  for  various  values 
of  r,  where  1  -  r  represents  the  fraction  of  symbol  energy  transmitted 
during  periods  of  isolated  transmission. 


n«Ar*far  ratxo 
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Abstract 

A  fully  aaynchronous  sin^e  naer  receiver  in  u  code-divinoa 
multiple- access  (CDMA)  system  is  considered.  It  is  assumed 
that  the  receiver  has  no  knowledge  of  the  signature  waveforms 
or  timing  information  of  other  users.  The  receiver  is  trained  by 
a  known  training  sequence  prior  to  data  transmission,  and  con¬ 
tinuously  adjusted  I7  an  adaptive  algorithm  during  data  trans¬ 
mission.  An  adaptive,  fractionally  spaced  least  mean  square 
(LMS)  filter  is  employed  for  each  user  separatdy,  instead  of 
matched  filters  with  constant  coefficients.  The  proposed  re¬ 
ceiver  is  as  simple  as  a  standard  angle  user  detector  receiver  but 
it  achieves  essential  advantages  with  respect  to  timing  recov¬ 
ery,  multiple-access  interference  elimination,  narrowband  in¬ 
terference  suppression  and  user  privacy.  In  comparison  to  the 
centralised  linear  multi-user  detectior  it  has  the  same  bit  error 
performance  while  the  computation  complexity  is  substantially 
lower  and  independent  of  the  number  of  users.  The  receiver 
structure  is  investigated  and  tested  by  simulation  using  a  set 
of  Gold  sequences  of  length  31.  Experimental  results  show  that 
a  considerable  improvement  in  bit  error  rate  is  achieved  with 
respect  to  the  conventional  single-user  receiver. 


1  Introduction 

Several  approaches  to  the  CDMA  demodulation  problem  have 
been  considered  so  far.  The  conventional  approach  consists 
in  demodulating  each  signal  using  a  single  user  detector  with 
a  matched  filter,  thereby  ignoring  the  multiple  access  inter¬ 
ference  (MAI)  caused  by  cross<orrelation  between  ngnals  of 
different  users  [3].  This  approach  has  two  major  shortcomings: 

(1)  high  sensitivity  to  the  near-far  effect,  and  (2)  the  channel 
capacity  being  interference  limited,  instead  of  being  limited  by 
the  AWGN  level.  On  the  other  hand  this  approach  has  the 
advantage  of  being  very  simple  to  implement. 

An  alternative  receiver  structure  is  a  maximum  likdihood 
multi-user  demodulator  for  synchronous  and  asynchronous 
transmission.  The  maximum  likelihood  multi  user  receiver  con¬ 
sists  of  a  bank  of  matched  filters  followed  by  a  N^terbi  maximum 
likelihood  detector  [l].  The  computational  complexity  of  the 
optimum  demodulator  increases  exponentially  with  the  number 
of  users. 

In  a  number  of  papers  a  less  complex  class  of  suboptimal  cen¬ 
tralised  linear  multi  user  detectors  (CLMD)  is  proposed  where 
the  computational  complexity  of  the  recaver  increases  linearly 


with  the  number  of  users  [3].  The  receiver  is  ’’near-far  resis¬ 
tant*  eliminating  the  need  for  strict  power  control.  The  draw¬ 
back  of  this  approach  is  that  the  parameters  of  all  users  includ¬ 
ing  signatures,  timings  and  carrier  phases  have  to  be  known. 
The  accuracy  in  estimation  of  these  parameters  strongly  in¬ 
fluences  the  angle  user  detection  process  and  instability  can 
spread  to  other  users  making  whide  system  unstable.  The 
CLMD  is  considerably  complex  relative  to  the  conventional  sin¬ 
gle  user  detector.  The  CLMD  proved  that  the  c^tadty  limita- 
Uon  of  the  CDMA  system  by  MAI  is  consequence  of  the  conven¬ 
tional  single  user  ^proach  rather  then  the  inherent  property 
of  the  CDMA  system.  Moreover,  it  proved  that  this  limitation 
can  be  overcome  by  a  finear  receiver. 

In  this  paper  we  consider  a  single  user  detector  approach.  A 
single  adaptive  minimum  mean  square  error  (MMSE)  filter  as¬ 
signed  to  each  user  eliminates  interference  from  other  users  to 
the  same  extent  as  it  does  linear  multi  user  detector.  However, 
timing,  signatures  or  carrier  phase  information  from  other  users 
are  not  needed.  Receivers  perform  independently  making  the 
system  more  stable  and  suitable  for  adaptive  implementation. 
An  adaptive  filter  is  necessary  to  handle  time  varying  system 
parameters.  It  is  important  to  note  that  in  contrast  to  the 
centralised  multi  user  receiver  the  observation  vector  is  not  the 
output  from  the  bank  of  matched  filters,  but  the  sampled  sig¬ 
nal  itself.  Another  important  feature  of  the  proposed  receiver  is 
the  use  of  a  fractionally  spaced  filter  which  is  insensitive  to  the 
time  differences  in  the  signal  arrival  times  of  different  users. 
Thus,  the  receiver  timing  recovery  is  extremely  simplified  (if 
necessary  at  all)  [4]. 
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Coherent  detection  of  asynchronotu  Code- Division  Multiple- 
Access  (CDMA)  data  transmissions  over  a  Rician  fading  channd 
is  considered  in  the  context  of  a  multipoint-to-point  communica¬ 
tion  system,  where  a  centralised  receiver  that  is  assumed  to  have 
knowledge  of  the  signature  signals  of  the  system  users,  including 
the  arri^  times  of  the  former,  observes  a  superposition  of  the 
specular  and  faded  signal  components  of  each  of  the  users  in  ad¬ 
ditive  noise.  The  channel  itsdf  is  assumed  to  be  non-dispersive, 
and  with  no  fading  memory,  i.e.,  the  random  attenuations  and 
phase  shifts  experienced  by  the  users’  tnmsmissions  over  differ¬ 
ent  bit  intervals  are  assumed  to  be  independent  of  each  other, 
rendering  them  inestimable. 

It  turns  out  that  this  channel  is  equivalent  to  a  fictitious 
CDMA-AWGN  (Additive  White  Gaussian  Noise)  channel  from 
the  point  of  view  of  optimal  detection  [1];  it  may  be  shown  that 
the  optimum  decision  rules  over  the  two  channds  as  well  as  the 
statistical  characterisation  of  the  sufficient  statistics  in  each  case 
parallel  each  other.  Unlike  the  optimum  AWGN  multiuser  de¬ 
tector  however,  the  optimum  faded  detector  is  unimplementable 
using  a  Viterbi  algorithm.  This  motivates  the  derivation  of  the 
polynomial-complexity  faded  decorrelator  for  this  chund. 

Detector  performance  in  a  multiuser  faded  environment  may 
be  adversely  affected  by  both  the  interferer  specular  (or  known) 
and  faded  (or  unestimatable)  signal  components.  The  perfor¬ 
mance  limiting  effect  of  the  former  on  conventional  detection  as 
well  as  the  ability  of  fading  channel  strategies  to  withstand  such 
interference  has  been  studied  earlier  [1].  The  issue  of  the  limita¬ 
tions  on  detector  performance,  if  any,  due  to  interfering  fading 
is  addressed  here.  To  this  end,  we  introduce  the  fading  suscep- 
tance  and  fading  resistance  measures;  the  former  as  a  measure  of 
whether  degradations  in  detector  performance  due  to  interfering 
fading  are  so  great  so  as  to  prevent  them  from  being  compete- 
tive  with  optimum  detection  strategies  over  single-user  channels, 
and  the  latter  as  a  measure  that  captures  the  ability  of  detec¬ 
tors  to  withstand  such  interference.  These  asymptotic  measures 
characterize  detector  performsunce  in  regions  where  the  fading  of 
the  interfering  users  as  opposed  to  their  specular  energies,  is  the 
dominant  impediment  to  detection. 

An  analysis  of  the  conventional  detector’s  performance  in  the 
multiuser  faded  environment  reveals  that  fading  interference  is 
capable  of  limiting  detector  performance  in  a  manner  similar  to 
specular  component  interference  bringing  out  the  hitherto  un¬ 
recognized  performance  limiting  effect  of  fading  interference  in 
Rician  fading  CDMA  channels.  The  AWGN  multiuser  detectors 
(optimal  and  decorrelating),  which  are  designed  for  the  CDMA- 
AWGN  channd,  while  sub-optimally  resistant  to  specular  inter¬ 
ference,  pay  a  penalty  for  ignoring  the  presence  of  fading  in  that 
they  are  also  found  to  be  susceptible  to  fading  interference.  Thus 
we  demonstrate  that,  even  in  environments  where  specular  inter¬ 
ference  is  marginal,  both  of  the  above  detectors  are  incapable  of 
competing  with  detectors  of  isolated  transmissions,  if  the  fading 
of  an  intefering  user  dominates  the  background  noise.  The  faded 
detectors,  both  optimal  and  decorrdating,  are  however  found  to 
withstand  such  interference.  In  bght  of  their  previously  demon¬ 


strated  resistance  to  specular  interference,  and  the  additional 
computational  considerations,  our  results  make  the  case  for  the 
use  of  the  faded  decorrelators  for  multiuser  detection  over  asyn¬ 
chronous  Rician  faded  CDMA  channds. 

The  plots  of  Figure  1  lae  an  illustration  of  the  implications 
of  our  results  for  detector  Bit-Error  Rates  (BERs)  in  realistic 
as  opposed  to  asymptotic  environments.  We  observe  that,  over 
a  two-user  channd,  even  with  a  weak  interferer  specular  ug- 
nal,  the  first  user  of  the  faded  decorrdator  alone  exhibits  an 
exponential  decay  in  BER  with  increasing  Signal-to-Ncnse  Ra¬ 
tio  (SNR)  with  both  the  conventional  detector  and  the  AWGN 
decorrdator  forced  by  the  non-zero,  fixed  interferer  fading,  to 
approach  error  floors. 
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Figure  1.  BERs  of  the  conventional  detector,  the  AWGN  and 
faded  decorrdators  versus  SNR. 
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Abstract 

We  address  the  problem  of  Code-Division  Multiple 
Access  (CDMA)  systems  in  a  multipath  environment  where 
a  high  data  rate  introduces  intersymbol  interference  as 
wen  as  inter-user  interference.  We  discuss  a  linear  equal¬ 
izer  as  wdl  as  a  reduced-state-sequence  estimation  (RSSE) 
scheme  for  CDMA  channels  with  intersymbol  interfer¬ 
ence.  The  linear  equalizer  is  a  modification  of  the  decor- 
rdating  detector  proposed  by  Lupas  and  Verdu.  The 
RSSE  coUapses  2^  states  into  2  states  where  K  is  the 
number  of  users.  Both  techniques  are  superior  to  a  tra¬ 
ditional  matched  filter  detector  (RAKE)  when  the  in- 
temser  interference  and/or  the  intersymbol  interference 
is  relatively  strong.  In  a  severe  near-far  environment,  the 
RSSE  can  show  significant  (2-3  dB)  improvement  over 
the  linear  equalizer  in  terms  of  receiver  signal-to-noise 
ratio. 


Direct  Sequence  Code-Division  Multiple  Access  (CDMA)  has 
been  proposed  for  commercial  data  networks.  Its  strength  is  it 
can  increase  the  capacity  of  a  system  due  to  the  absence  of  a 
guardband  requirement[5].  Its  weakness  is  that  it  can  suffer  from 
the  near-far  problem:  a  user  experiencing  strong  interference 
from  other  users  (near)  while  its  own  signal  is  relatively  weak 
(  far).  Techniques  such  as  power  control  have  been  proposed 
for  combating  the  near-far  problem.  We  propose  two  additional 
signal  processing  methods  for  when  power  control  is  not  feasi¬ 
ble.  The  first  technique  is  a  linear  equalizer,  a  modification  of 
the  decorrelating  detector  proposed  by  Lupas  and  Verdu  [2],  [3]. 
The  second  is  a  reduced-state  sequence  estimation  (RSSE),  an 
implementation  of  the  Maximum-Likelihood  Sequence  Detector 
(MLSD)  for  CDMA  channels  with  intersymbol  interference  (ISI) 
as  well  as  interuser-interference  [1]. 

For  modeling  the  CDMA  channels  with  multi-path,  we  as¬ 
sume  a  maximum  delay  spread  of  250  nanoseconds,  with  40  chips 
per  bit  and  a  bit  rate  of  5  Mbs.  We  modulate  the  chips  with 
a  square-root  raised-cosine  pulse.  We  experience  ISI  from  the 
adjacent  data  bit,  as  well  as  interuser  interference.  In  simulation 
we  have  restricted  ourselves  to  2  and  3  users,  but  the  techniques 
can  be  extended  to  multiple  users.  At  the  data  and  chip  rates 
suid  delay-spreads  we  have  assumed,  our  single-user  ISI  chan¬ 
nel  is  analogous  to  a  1  -(-  aD  channel.  We  iJso  assume  that  the 
chip  sequence  is  repeated  each  information  bit  and  the  multipath 
channels  share  the  same  group  delay.  In  both  the  Linear  Detec¬ 
tor  and  the  RSSE  we  assume  a  good  knowledge  of  the  effective 
multi-dimensional  channel. 

In  the  Linear  Detector,  we  match  the  received  signal  with 
each  user’s  multipath  channel  and  chip  sequence.  The  effective 
channel  response  after  matching  has  the  form  of  an  invertible 
matrix.  We  can  invert  or  decorrelate  the  channel  with  either 
a  zero-forcing  or  minimum-mean-square-error  solution.  It  can 
be  shown  using  the  matrix  inversion  lemma  that  this  solution  is 
equivalent  to  a  linear  equalizer  at  the  chip  rate. 
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Implementation  at  the  chip  rate  has  the  advantage  of  not 
explicitly  needing  to  evaluate  the  interfering  users’  chaimel  re¬ 
sponses.  For  the  chip-rate  equalizer  we  can  treat  the  interfering 
users  as  noise,  and  estimate  the  covariance  matrix  via  a  training 
sequence  [4]. 

The  RSSE  algorithm  is  an  implementation  of  the  MLSD  that 
reduces  the  number  of  states  from  2^  to  2.  In  each  state  the 
elements  differ  by  at  least  2  user  values.  We  use  a  modihed 
Viterbi  algorithm  where  first  we  search  among  possible  pairs  of 
inputs  within  one  state,  then  we  find  the  minimum  cost  in  o  each 
of  the  2  states  of  our  trellis.  The  new  minimum  distance  for  the 
RSSE  is  the  minimum  of  the  trellis  distance  and  the  distance 
between  the  pulse  responses  to  the  elements  within  the  states. 
The  trellis  minimum  distance  is  determined  by  the  smallest  user 
received  energy.  By  forcing  the  elements  within  a  state  to  differ 
by  at  least  two  values,  the  system  must  mistsJce  the  output  from 
two  different  users  before  deciding  incorrectly.  In  a  severe  near- 
far  environment,  the  minimum  distance  of  the  trellis  will  often  be 
the  smaller  of  the  two  distances,  guaranteeing  optimum  detection 
in  the  presence  of  additive  white  Gaussian  noise. 

We  have  simulated  the  performance  of  the  linear  equalizer 
and  the  RSSE  and  compared  them  to  the  output  of  a  matched 
filter  (RAKE)  detector  in  multi-channel  environments  where  one 
user  experiences  a  near-far  problem.  We  have  found  that  both 
techniques  have  a  SNR  15-20  dB  greater  than  the  traditional 
RAKE,  with  the  RSSE  performing  2-3  dB  better  than  the  linear 
equalizer  in  most  severe  near-far  environments  with  intersymbol 
interference. 
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Summary 


Multiuser  detectors  have  superior  performance  over  their  single- 
user  counterparts  in  a  multiple-access  channel,  assuming  perfect 
knowledge  of  system  parameters  [1-3].  In  this  paper  we  extend  the 
analysis  of  multiuser  detectors  in  fading  channels  by  incorporat¬ 
ing  the  effects  of  imperfect  parameter  estimates  on  symbol  error 
probability.  This  type  of  analysis  should  be  useful  in  designing 
multiuser  receivers,  showing  the  error  rate  sensitivity  to  channel 
parameter  mismatch. 

We  focus  on  a  synchronous  CDMA  channel  shared  by  K  users 
where  the  signal  of  each  user  arrives  at  the  central  receiver  through 
an  independent  flat  Rayleigh  fading  channel.  The  central  receiver 
has  the  knowledge  of  the  signature  waveforms  of  all  users,  and  the 
outputs  of  a  matched  filter  bank  provide  the  sufficient  statistics. 
We  also  assume  that  a  state  space  description  of  the  fading  distor¬ 
tion  is  available. 

In  a  single-user  situation  an  optimum  receiver  structure  in  the 
flat  fading  channel  [4]  consists  of  an  adaptive  estimator  of  fading 
distortion,  and  a  detector  which  utilizes  these  estimates.  The  esti¬ 
mation  of  the  complex  channel  distortion  performs  the  task  of  the 
carrier  recovery.  Given  the  state-space  model  of  the  channel  dis¬ 
tortion,  the  Kalman  filter  is  the  optima],  minimum  variance  state 
estimator.  A  suboptimum,  realizable  receiver  can  be  implemented 
using  the  decision-directed  approach  where  the  data  dependence  is 
removed  from  the  matched  filter  output. 

Multiuser  carrier  recovery  can  be  accomplished  in  two  ways. 
The  first  approach  is  proposed  in  (.5)  in  which  the  matched  fil¬ 
ter  outputs  are  decorrelated  and  each  user  employs  phase  estima¬ 
tors  which  assume  isolated  transmission.  In  this  case  the  multiple- 
access  interference  is  removed  from  the  matched  filter  outputs  at 
the  expense  of  noise  enhancement  and  correlation,  which  affects  the 
performance  of  the  carrier  recovery  circuit.  We  zJso  consider  the 
vector  generalization  of  the  receiver  proposed  for  the  single-user 
channel  [4].  Due  to  synchronism  among  the  users,  the  data  depen¬ 
dency  can  be  removed  in  a  decision-directed  manner  and  the  joint 
phase  estimates  are  obtained  by  using  a  multi-input  multi-output 
Kalman  filter. 

We  focus  our  analysis  on  two  low-complexity  suboptimum  mul¬ 
tiuser  detectors,  the  coherent  and  differentially  coherent  dccorre- 
lating  detector.  The  coherent  decorrelating  detector  utilizes  phase 
estimates  obtained  by  the  aforementioned  carrier  recovery  tech¬ 
niques.  In  this  case,  the  probability  of  error  can  be  calculated 
using  Stein’s  unified  analysis  [7j.  Assuming  perfect  symbol  phase 
elimination,  the  lower  bound  on  the  error  probability  is  given  by 


1  ~  Gkk 


(1) 


where  Gkk  is  the  error  variance  of  the  phase  estimate,  |R~'|u  is  the 
element  of  the  cross-correlation  matrix  inverse  and  74  is  the  average 
signal  to  noise  ratio,  all  corresponding  to  the  A-"'  user.  Note  that 
the  error  probability  of  the  coherent  decorrelating  detector  does 
not  depend  on  interfering  signal  amplitudes,  although  it  depends 
on  the  cross-correlations  of  normalized  signature  waveforms  and 
the  estimation  error.  In  contrast  to  the  case  of  perfect  estimation. 


the  error  probability  floor  is  observed,  which  depends  on  the  error 
variwee  related  to  the  phase  tracking  inaccuracies. 

Considering  the  carrier  recovery  as  the  estimation  of  the  fading 
distortion  we  reveal  a  means  for  comparing  coherent  and  differ¬ 
entially  coherent  detectors  |6].  In  the  case  when  we  are  not  able 
to  estimate  complex  ch2umel  coefficients,  both  signal  energies  and 
phases  of  all  users  are  unknown  at  the  central  receiver,  and  wt.  re¬ 
sort  to  differentially  coherent  decorrelating  detector,  applying  the 
differential  decision  logic  after  the  decorrelating  filter  [2].  Tak¬ 
ing  into  consideration  the  performance  degradation  due  to  channel 
phase  changes  over  two  consecutive  signaling  intervals,  the  error 
probability  expression  for  the  user  is 
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where 
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and  Ek  is  energy  per  bit. 

Several  numerical  examples  will  be  provided  for  the  compar¬ 
ison  of  these  multiuser  detectors.  For  the  coherent  decorrelating 
detector  the  comparison  of  two  analyzed  carrier  recovery  strategies 
indicate  that  the  joint  phase  detection  results  in  smaller  variance 
of  the  phase  estimate,  resulting  in  better  performance  of  the  detec¬ 
tor.  This  is  to  be  expected  since  the  decorrelating  filter  enhances 
the  noise  prior  to  carrier  recovery.  Although  an  error  probability 
floor  is  observed  for  both  analyzed  multiuser  detectors,  the  coher¬ 
ent  detector  outperforms  the  differentially  coherent  one.  However, 
this  is  true  comparing  the  lower  bound,  when  perfect  elimination 
of  the  symbol  phase  has  been  assumed,  to  the  exact  expression  for 
the  error  probability  of  the  differentially  coherent  scheme. 
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Abstracr.  The  performance  of  a  finite  complexity  Minimum  Mean 
Squared  Error  (MMSE)  linear  detector  for  demodulating  Direct 
Sequence  Spread-Spectrum  (DS/SS)  Code  Division  Multiple  Access 
(CDMA)  signals  is  studied.  The  MMSE  detector  is  near-fat  resistant, 
and  can  be  implemented  adaptively  when  no  explicit  knowledge  ot  the 
interferers'  signature  sequences  is  available.  We  assume  that  users  are 
assigned  random  binary  signature  sequences  and  derive  upper  and  lower 
bounds  on  the  average  near-far  resistance  of  the  MMSE  detector.  For 
synchronous  CDMA,  the  MMSE  detector  considered  has  the  same  near- 
far  resistance  as  the  maximum  likelihood  and  deconelating  detectors,  so 
that  the  bounds  derived  apply  to  these  detectors  as  well.  Approximate 
expressions  lor  average  error  probability  and  signal-to-interference  ratio 
are  also  presented,  and  are  compared  with  the  analogous  results  for  the 
matched  filter  receiver  with  random  signature  sequences. 

I.  INTRODUCTION 

Minimum  Mean  Squared  Error  (MMSE)  linear  detection  for 
direct-sequence  spread-spectrum  (DS/SS)  signals  has  recently  been  con¬ 
sidered  in  (1},  [41-151.  In  14]  MMSE  linear  detectors  of  varying  complex¬ 
ity  were  proposed,  and  in  [5]  the  near-far  resistance  and  error  probability 
of  these  detectors  were  evaluated  for  a  specific  assigtunent  of  signature 
sequences.  Assuming  that  the  complexity  of  the  detector  is  matched  to 
the  number  of  strong  interferers,  these  detectors  do  not  suffer  from  the 
near-far  problem.  Furthermore,  the  MMSE  criterion  leads  to  adaptive 
implementations  in  which  the  interference  parameters  are  not  explicitly 
known  a  priori.  These  schemes  are  decent^ized  in  the  sense  that  they 
are  designed  to  demodulate  a  single  user  in  ihe  presence  of  multiple- 
access  interference,  as  opposed  to  the  centralized  demodulation  of  all 
active  users  described  in  12|-13|  and  Ihe  references  therein. 

Here  we  analyze  the  performance  of  the  N-tap  MMSE  detector  (N 
is  the  processing  gain),  introduced  in  |4|,  assuming  that  the  signature 
sequences  assigned  to  different  users  are  independent  random  binary 
sequences.  The  performaiKe  measures  are  averaged  over  the  signature 
sequences  of  all  the  users.  Although  deterministic  sequences  are  used  in 
practice,  the  assumption  of  random  signature  sequences  yields  a  rough 
characterization  of  system  performance  in  terms  of  a  few  key  system 
parameters  (the  processing  gain  N  and  the  number  of  active  users  K  in 
this  case).  This  approach  has  been  extensively  employed  to  analyze  the 
performance  of  die  matched  filter  receiver.  It  is  shown  that  the  N-tap 
MMSE  detector  achieves  significant  performaiKe  gains  relative  to  the 
matched  filter. 

The  N-tap  MMSE  detector  consists  of  an  A/-tap  linear  filter  fol¬ 
lowed  by  a  threshold  detector.  The  tap  spacing  is  a'^sumed  to  be  the  chip 
interval,  and  the  taps  are  .selected  to  minimize  the  Mean  Squared  Enor 
(MSE)  between  the  delected  and  transmitted  symbols.  We  derive  upper 
and  lower  bounds  on  the  average  near-far  resistaiKe  of  this  detector, 
together  with  approximations  for  Ihe  signal-to-interfereiKe  ratio  and  Ihe 
enor  probability.  For  a  syiKhronous  system,  it  is  in'"iesting  to  note  diat 
the  MMSE  detector  has  the  same  near-far  resistaiKe  as  centralized  detec¬ 
tion  schemes  such  as  the  maximum  likelihood  detector  and  the 

decorrelaling  detector  (see  |2J-|3]),  so  that  Ihe  bounds  on  near-far  resis¬ 
taiKe  given  here  apply  to  these  latter  detectors  as  well. 

For  Ihe  purpose  of  exposition,  we  consider  a  system  in  which  the 
transmissions  are  both  chip-  and  symbol-syiKhronous.  Results  for  asyn¬ 
chronous  systems  have  also  been  obtained,  but  are  omitted  from  this 
suounary. 

II.  .SYSTEM  MODEL  AND  RESULTS 

Consider  the  equivalent  discrete-lime  system  obtained  by  sampling 
the  output  of  a  filter  matched  to  the  chip  waveform  at  the  chip  rate.  There 
are  then  N  samples  per  bit  interval,  which  form  a  received  vector 
r€  R^.  Denoting  the  illh  bit  of  user  one  (Ihe  desired  u.ser)  as 
bilk  I e  (-1.1 1,  the  N-tap  MMSE  detector  forms  Ihe  e.stimale 
btlbl  =  sgn(c’'r),  where  c  is  selected  to  minimize 
MSE  =E  Kc^’r-bilt])^).  For  a  synchronous  system,  the  received  vec¬ 
tor  re  R'*'  corresponding  to  the  k  th  bit  is  given  by 

••1*1=  hi[k]A,»j-¥n[k\, 


where  the  vector  ay  is  the  signature  sequeiKe  of  the  yth  user.  Ay  is  Ihe 
received  amplitude  of  user  j,  and  the  noise  vector  n  is  Gaussian  with 
mean  zero  a^  covariaiKe  matrix  o^/jv ,  where  In  denotes  the  NxN  iden¬ 
tity  matrix.  The  signature  sequences  ay  =:(<iy[0],ay(l] . aj[N  -l]f, 

j  =  are  assumed  to  be  random.  That  is,  uyl/], 

Oil  iN-l,  are  independent  random  variables  each  taking  value  -fl  or 
-1  with  equal  probability. 

The  near-far  resistance  of  the  detector  is  a  measure  of  the  robust¬ 
ness  of  the  detector  with  respect  to  variations  in  the  received  interfereiKe 
power  (see  [2]-[3)  for  a  technical  definition).  If  the  users’  signature 
sequeiKes  are  not  orthogoiuU,  then  Ihe  near-far  resistaiKe  of  the  matched 
filter  detector  is  zero.  For  the  MMSE  detector  considered,  the  near-far 
resistaiKe  is  evaluated  by  letting  the  interference  amplitudes  Ay  ->oo.  In 
this  case  the  MMSE  solution  for  c  is  the  orthogonal  projection  of  ai  onto 

the  space  spanned  by  the  interfering  vectors  az . a^.  That  is,  the 

MMSE  solution  becomes  the  zero-forcing  solution  in  the  sense  that  the 
mterference  is  completely  suppressed  (at  the  expense  of  enhancing  the 
noise).  Denoting  the  preceding  orthogonal  projection  as  0|,  the  near-far 
resistance  of  the  MMSE  detector  is  given  by  q  =|  oj  ^  aj 

Let  R  denote  the  normalized  crosscoirelation  matrix  of  the  interfer¬ 
ing  users'  signature  sequences.  That  is,  B;y  =(af»j)IN  for  2ii,j  iK. 
Also  define  the  normalized  crosscoirelation  of  the  desired  vector  with  the 
I  th  interference  vector  as  pi  =  (arai  )/N .  Then  the  near-far  resistance  of 
the  MMSE  detector  considered  can  be  wriuen  as  q  =  1  -  p^R^p,  where 
Rf  is  a  pseudo-inverse  of  R.  Our  main  results  ate  bounds  on  the 
expected  value  of  q,  where  expectation  is  with  respect  to  the  users’  sig¬ 
nature  sequences.  Specifically,  we  first  show  that  £[q]  =  1  -£[<f/]/N, 
where  <//  is  the  (random)  dimension  of  the  subspace  of  R^  spanned  by 
the  interference  vectors  az, . . . ,  ax .  We  then  obtain  upper  and  lower 
bounds  for  £  [</;  )/N ,  and  thereby  obtain  the  following  upper  and  lower 
bounds  for  the  average  near-far  resistaiKe, 

\-{K-\)IN  S£lq]  l-/(jr-l)[(^:-l)/N], 

where  /(n)=  Pj  (1  -2^"ri].  Nme  that  the  upper  and  lower  bounds  are 

tight  for  AT  «N.  Further,  the  upper  bound  can  be  tightened  by  applying 
a  stochastic  domination  argument. 

We  also  consider  two  other  performance  measures,  the  signal-to- 
interference  ratio  and  the  error  probability.  Approximations  assuming 
large  N  are  derived  for  the  expected  values  of  th^  quantities.  Numeri¬ 
cal  results  contrasting  the  different  performance  of  the  MMSE  and 
matched  filter  receivers  for  random  signature  sequences  will  be  presented 
at  the  conference.  In  addition,  analogous  results  for  asynchronous  sys¬ 
tems  will  be  mentioned. 
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ASYMPTOTIC  MULTIUSER  EFFICIENCY  FOR  2-STAGE  DETECTORS 

INAWGN  CHANNELS 

David  Brady,  ECE  Dept.,  Northeastern  University,  Boston,  MA  021  IS 


Abstract 

In  the  AWGN  multiple-access  channel  with  binary  phase- 
shift  keying  modulation,  the  user  error  probability  for  a 
given  demodulator  vanishes  exponentially  with  the  noise  level  as 
-ijt  SNRii/2,  where  is  the  asymptotic  multiuser  efficiency 
(AME),  and  SNE^  is  the  received  signal-to-background-noise 
ratio.  Thus,  the  asymptotic  multiuser  eflBciency  is  an  attenu¬ 
ation  of  the  error  rate  exponent  for  isolated  transmission  and 
maximum  a  posteriori  demodulation,  and  provides  a  simple  yet 
precise  means  of  comparing  multiuser  receivers  for  sufficiently 
low  noise  levels.  To  date,  this  parameter  is  only  known  for  the 
following  receivers  in  the  2-user,  asynchronous  AWGN  channel: 
the  maximum  likelihood  sequence  detector,  the  decorrelating 
detector,  the  linear  MMSE  detector,  and  the  conventional  de¬ 
tector.  In  this  talk  the  asymptotic  multiuser  efficiencies  for  a 
class  of  detectors  for  the  2-user,  asynchronous  AWGN  channel 
will  be  presented.  This  class  may  be  loosely  described  as  re¬ 
ceivers  which  estimate  and  subtract  multiple-access  interference 
(MAI)  by  using  tentative  data  decisions,  and  includes  the  two- 
stage  detectors  with  both  conventional  or  decorrelated  tentative 
decisions.  The  asymptotic  multiuser  efficiencies  for  this  class  of 
detectors  clearly  indicate  regions  for  which  a  given  user  should 
avoid  updating  tentative  decisions  and  suggest  combinations  of 
the  above  receivers  to  improve  single-user  performance.  This 
technique  applies  to  the  AME  of  soft  tentative  decision  strate¬ 
gies  as  well,  and  we  demonstrate  that  the  near-far  resistance  of 
two-stage  detectors  may  be  markedly  improved  using  soft  deci¬ 
sion  nonlinearities.  Below  we  present  an  outline  of  the  approach 
for  conventional  tentative  decisions. 

System  Model 

The  matched- filter  output  for  user  1  at  time  0  may  be  writ¬ 
ten  as 

!/l(0)  =  6i(0)u)i  -)-  b2{-l)p2i  +  ft2{0)pi2  -f-  ni(0), 
where  64(1)  is  the  binary  antipodal  data  of  user  k  during  time 
[iT,  (i -I- 1  )T),  wii  is  the  energy  of  the  waveform  the  symbol 
waveform  for  user  k,  and  describes  the  (2)  partial  cross¬ 
correlation  among  the  asynchronous  waveforms  Sj{t)  and 
In  general,  we  define  ni,{j)  as  the  Gaussian  noise  component 
in  VitU)’  matched-filter  output  for  user  k  at  the  end  of  the 
j**  symbol  period.  A  general  two-stage  detector  forms  a  final 
decision  for  61  (0)  via  sign  detection 

5|(0)  =  spn[yi(0)  -  62(-l)P21  -  ^(0)P12] 

where  b  denotes  a  tentative  decision  for  the  symbol  b.  If  con¬ 
ventional  tentative  decisions  are  to  be  employed,  then 

*2(0  =  «pn[y2(')l- 

It  has  been  shown  that  the  error  probability  for  user  1  may  be 
expressed  as  a  finite  number  of  terms,  each  one  is  proportional 
to 

•^{/(0,i)nj(0)>/(0,iM0j))] 


where  =  ±1,  i,  j  6  (ji,  ±lj',  and  are  constants. 

An  exact  form  for  the  exponential  rate  of  this  probability  is 
crucial  to  the  solution  of  the  asymptotic  multiuser  efficiency  for 
the  two-stage  detector,  and  is  found  via  the  following  lemmru 
Let  the  noise  vector  [n2(~U)”l(0),'»2(0)l^  be  Gaussian  with 
zero  mean  vector  and  autocovariance  matrix  a^K  =  a^SS'^, 
and  let  the  row  of  5  be  denoted  by  Sf.  Let  g  =  [ji,  J2. 
denote  a  Gaussian  vector  with  zero  mean  and  autocovariance 
matrix  a^I,  and  let  Xi  ;  =  {g  :  'g  > 

(g  :  Si  S  >  u>i-2(i>2i+jpi2)}n{g  :  /(0,j)53  g  >  /(O,  j)o(0,i)}, 
where  5;  •  g  denotes  a  vector  inner  product. 

Lemma 


where  q  is  the  vector  which  satisfies 

q  =argminx6;^.^  Il’'ll 


Numerical  Example 

Figure  1  shows  the  AME  for  user  1  in  the  asynchronous, 
2-user  AWGN  channel  with  significant  correlation  among  the 
nom.alized  waveforms.  As  usual,  the  AME  is  displayed  as  a 
function  of  the  relative  en»Tgy  of  the  interfercr.  We  have  shown 
the  AME  for  the  maximum  likelihood  sequence  detector  (MLS), 
the  decorrelator,  and  the  conventional  detector,  and  the  two- 
stage  detector  with  both  conventional  and  decorrelated  tenta¬ 
tive  decisions.  Note  that  the  AME  of  the  two-stage  detector 
with  decorrelated  tentative  decisions  dominates  that  using  con¬ 
ventional  tentative  decisions.  Also  of  interest  is  to  note  that 
the  AME  of  the  two-stage  detector  ’.vith  conventional  tentative 
decisions  is  dominated  by  that  of  the  con\entional  detector  for 
sufficiently  weak  interference,  aiid  that  the  near-far  resistance 
of  the  former  detector  is  zero.  Both  two-stage  detectors  exhibit 
similar  error  rate  exponents  to  their  tentative  decision  counter¬ 
parts  when  the  energies  of  the  users  are  roughly  the  same. 
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Universal  coding  of  non-discrete  sources  based  on  distribution 
estimation  consistent  in  expected  information  divergence 

Andrew  R.  Barron^  Laszlo  GyorfiJ  and  Edward  C.  van  der  Meideid 


We  show  how  a  certain  distribution  estimator  /i* 
which  is  consistent  in  expected  information  diver¬ 
gence  leads  to  a  universal  code  for  the  class  of  all 
probability  measures  fi  on  a.  non-discrete  space 
which  are  dominated  in  /-divergence  by  a  known 
probability  measure  i/. 

Let  fi  be  an  unknown  probability  measure  on  X  = 
ER,*^  and  consider  the  problem  of  estimating  fi  based 
on  i.i.d.  observations  Ai, Xn  from  fi.  As  a  priori 
information  we  cissume  that  there  exists  a  known 
probability  measure  u  on  X  such  that  I(ft,i>)  <  oo. 
Define  integers  m„,0  <  77i„  <  n,  and  real  numbers 
h„  >  0.  Let 

be  a  sequence  of  partitions  of  X  such  that  each  AJ, , 
is  a  cube  of  width  h„  and  ,)  >  hf,.  Let  0  < 
a„  <  1  be  a  given  sequence  with 

Urn  a„  =  0. 

n  — 00 

Let  fi„  denote  the  standard  empirical  measure  for 
A'j,...,  A„.  In  [1]  we  introduced  the  distribution 
estimator  /i*  defined  by 

i/(j[  p  .) 

i=l 

and  proved  the  following  theorem. 

Theorem  1  If  lim  /i„  =  0,  lim  =  q,  and 

n— *00  n— ‘OO  ^ 

lim. sup  — <  1,  then 

n  — OO 

lim  E{J{fi,fi’„))  =  0 

U— *00 

for  all  fi  such  that  /(/i,  u)  <  oo. 

Hence  our  distribution  estimator  p*  is  consis¬ 
tent  in  expected  information  divergence  for  all  p 
for  which  /(p,  i/)  <  oo.  This  consistent  estimation 
leads  naturally  to  a  universal  source  code  (in  the 
.sense  of  [2])  for  the  same  class  of  distributions  for 
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arbitrarily  fine  quantizations  of  the  data  in  the  fol¬ 
lowing  way. 

Since  X  is  not  a  discrete  space,  data  sequences 
Xi,...,Xn  cannot  be  represented  exactly  by  a  noi.se- 
less  source  code.  Nevertheless,  for  any  given  parti¬ 
tion  Vn  =  {An.i}  of  A'",  no  matter  how  fine,  we  can 
code  the  element  of  the  partition  that  includes  the 
data  in  a  uniquely  decodable  way,  using  a  Shannon 
code,  which  assigns  a  codeword  (A(A„  ,)  of  length 
flog  1/7;„(A„,,)]  to  .4„  i  for  any  given  probability 
distribution  rfn  on  A'".  The  redundancy  Rn('Pn)  of 
this  code  equals  ^/„(p",7/„),  where  /n(p",7;„)  de¬ 
notes  the  information  divergence  between  ft"  and  rjn 
restricted  to  Vn-  The  least  upper  bound  /f‘  (over 
all  partitions  Vn)  on  the  redundancy  is  provided  by 
~I{fi’',  Tfn).  Based  on  our  distribution  estimator  //,' 
we  can  construct  a  distribution  t;,*  such  that 

Tl  Jl  *■■■ 

k=\ 

The  latter  term,  being  a  Cesaro  average,  will  tend 
to  zero  as  E(l{ft, fi^))  — *  0.  Hence,  using  7/,'  to 
encode  elements  of  a  partition  Vn,  we  obtain 

Theorem  2  For  all  discrete  mcmoryless  sources 
with  marginal  distribution  fi  for  which  I(fi,v)  <  oo, 
there  exists  a  universal  uniquely  decodable  code  for 
any  partition  Vn  such  that 

lim  R'n  0. 

n— ►OO 

We  conclude  that  the  estimator  p* ,  being  consi.s- 
tent  in  expected  information  divergence,  proviilos  a 
universal  code  for  arbitrarily  fine  quantizations  of 
the  data  for  the  class  of  all  distributions  p  on  IR"^ 
with  /(p,  u)  <  00. 
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Sequential  Model  Estimation  for  Universal  Coding  and  the 
Predictive  Stochastic  Complexity  of  Finite-State  Sources 

Marcelo  J.  Weinberger  *  Meir  Feder  ^  Jorma  Rissanen  * 


In  this  work  we  consider  sequential  universal  coding  of  Finite-State- 
Machine  (FSM)  probabilistic  sources.  A  unifilar  FSM  source  X  over 
a  discrete  alphabet  A  with  a  ieitcis  is  defined  by  an  FSM  F  and  a 
set  of  parameters  9  =  {p(i|s),4  €  5,i  e  A},  where  the  number  of 
states  S  =  |5|  is  finite,  and  F  has  an  initial  state  sq  and  its  progress  is 
determined  by  a  next-state  function 

In  the  sequel  we  refer  to  F  as  the  machine  supporting  the  model,  and 
it  is  assumed  irreducible  and  aperiodic.  The  probability  that  X  assigns 
to  a  string  xj  =  i|, . . .  ,i„  is 

i’(*?;F,«)  =  np(xik-i)-  (1) 

i=l 

The  per-symbol  entropy  of  blocks  of  length  n  emitted  by  X  is  denoted 
Hn{X;F). 

It  was  shown  in  [1]  and  [2]  that  the  average  codelength  £{L(i5f)} 
of  any  encoder  in  compressing  the  outcome  of  every  FSM  source  X, 
except,  possibly,  a  set  of  FSM  sources  whose  volume  vanishes  as  n 
increases,  satisfies 

>  H„{X-,  F)  +  ~  -(1  -  d.  (2) 

for  every  e  >  0  and  n  sufficiently  large. 

The  lower  bound  in  (2)  can  be  achieved  up  to  0(n"*)  by  a  simple 
batch  universal  encoder  which  sends  as  a  header  the  empirical  counts  in 
each  state  of  the  FSM,  and  then  assigns  to  the  data  a  code  matched  to 
these  counts.  If  F  is  unknown,  it  is  estimated  from  the  data  and  then 
sent  in  the  header  as  well  at  a  cost  0(n"*).  The  model  is  estimated  by 
minimizing 

-  i log P(X^-,  F,i)  +  (3) 

over  and  F.  Note  that 

min[-logF(xr;fi,F)l  =  nff(i?:F)  (4) 

£ 

i.e.,  at  the  optimal  choice  of  parameters,  the  string  probability  becomes 
the  empirical  entropy  with  respect  to  F.  The  model  selection  rule  of 
(3)  minimizes  the  codelength  of  this  batch  universal  procedure  since 
the  first  term  represents  the  cost  of  encoding  the  data  given  the  model 
and  its  parameters,  the  second  term  represents  the  cost  of  encoding  the 
S{a  -  1)  empirical  counts,  and  the  cost  of  encoding  the  description  of 
F  is  independent  of  n.  The  minimum  description  length  of  a  sequence 
with  respect  to  a  class  of  models,  using  a  possibly  batch  procedure, 
has  been  termed  [3]  the  non-predictive  stochastic  complexity  of  that 
sequence  with  respect  to  the  class. 

It  was  also  shown  [2]  that  a  similar  codelength  is  obtained  if  instead 
of  sending  the  empirical  counts  explicitly,  they  are  estimated  sequen¬ 
tially  and  used  at  each  time  instance  to  encode  the  next  symbol.  F  is 
still  estimated  from  the  entire  data  in  a  batch  procedure  by  minimizing 
(3),  and  is  sent  as  a  header.  The  minimal  codelength  attained  by  this 
universal  coding  procedure  was  termed  semi-predictive  complexity. 
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The  most  interesting  case  is  a  fully  predictive  one  where,  in  addi¬ 
tion  to  the  parameters,  F  is  also  estimated  sequentially.  Thus,  such  an 
algorithm  assigns  to  each  symbol  a  probability  that  depends  only  on 
past  outcomes  and  hence  can  be  used  to  define  a  universal  process.  It 
was  conjectured  in  [2],  [3]  that  the  fully  predictive  stochastic  complex¬ 
ity  is  also  asymptotically  equivalent  to  the  non-predictive  stochastic 
complexity.  The  main  result  of  this  work  is  a  proof  of  this  conjecture 
in  a  probabilistic  setting,  where  it  is  assumed  that  the  data  is  generated 
by  some  FSM  source.  Specifically,  if  at  each  time  I,  F  is  estimated  by 

F(t)  =  arg mm  F)  -h  +  ^)j  ^  (5) 

where  the  minimum  is  taken  over  all  FSM  models  F  and  C  >  1  +  l/2a, 
then  the  resulting  expected  codelength  approaches  the  entropy  at  the 
optimal  rate  of  (lr/2)(logn/n)  where  k  =  S(a  -  1)  is  the  number  (rf 
parameters.  Note  that  the  criterion  used  in  (5)  is  slightly  different  from 
a  sequential  MDL  criterion  in  which  the  model  is  estimated  sequentially 
by  minimizing  an  expression  similar  to  (3).  The  difference  is  only  in 
the  constant  of  the  “penalty  term”,  and  not  in  its  functional  behavior. 
Nevertheless  the  resulting  fully  predictive  complexity  is  the  optimal 
one,  since  its  expected  value  is 

+  (6) 

up  to  0(n~*)  term.  This  result  can  be  viewed  as  proving  the  existence 
of  universal  FSM  sources. 

To  prove  this  main  result  we  use  the  observation  [4]  that  a  sufficient 
condition  for  achieving  the  optimal  coding  rate  is  that  the  estimator  of 
F  satisfies 

00 

^P,(f)logf  <  00  (7) 

1=1 

where  Pt(t)  is  the  probability  of  error  in  estimating  the  model  at  time  t. 
We  show  that  the  estimator  (5)  is  strongly  consistent  and  furthermore 
satisfies  (7).  The  detailed  proof  is  given  in  [5]. 

The  fully  sequential  compression  algorithm  presented  here  is  not 
efficient.  In  a  related  work  [6]  we  have  considered  the  effective  context 
algorithm  and  have  shown  that  for  the  restricted  class  of  tree  sources 
its  average  codelength  also  approaches  the  entropy  at  the  optimal  rate. 
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Let  {.Y,}  ~  Pj.  9  6  A  C  R* .  Rissanen  has  shown  that  there 
exist  universal  noiseless  codes  for  {.YJ  with  per-Ietter  rate 
redundancy  as  low  as  where  N  is  the  blocklength 

and  K  is  the  number  of  source  parameters.  We  derive  an 
analogous  result  for  universal  source  coding  with  respect  to 
the  squared  error  fidelity  criterion:  there  exist  codes  with 
per-letter  rate  redundancy  as  low  as  and  per-letter 

distortion  (averaged  over  and  6)  at  most  D(f2)[l  +  ^], 
where  D(A)  is  an  average  distortion- rate  function  and  A'  is 
now  the  number  of  parameters  in  the  code. 


Let  {-Yi}  be  a  rtindom  process  over  alphabet  X  with  process 
measure  Ps,  9  6  A,  and  let  o  be  a  length- Af  block 

code  with  encoder  :  X^  — *  C  and  decoder  :  C  — *  , 

where  C  =  {cj, . . . , cjv/ }  C  {0,1}'  is  some  binary  prefix  code 
and  y  is  the  reproduction  alphabet,  typically  equtJ  to  X. 
A  universal  source  code  with  respect  to  a  fidelity  criterion 
d{X’^,Y^)  —  ^d{Xi,Yi)  is  a  sequence  of  block  codes  {q^} 
such  that  for  each  9  €  .A  there  exists  a  corresponding  sequence 
of  points  {(fi.v.s,  Av.tf)}  on  the  graph  of  the  .Vth  order  oper¬ 
ational  distortion-rate  function  for  Ps  for  which  the  per-letter 
“rate”  redundancy 

lps(q'")  =  i£s|a%Y^)|-«.v,s  (1) 

and  the  per-letter  “distortion”  redundancy 

^6s(q^}  =  ^Esd{X^,  q^iX'^))  -  Dy,s  (2) 

each  go  to  zero  uniformly  in  9  (in  which  case  the  code  is  strongly 
minimax  universal),  pointwise  in  9  (in  which  case  the  code  is 
weakly  minimax  universal),  or  in  expectation  with  respect  to  a 
probability  measure  on  9  (in  which  case  the  code  is  weighted 
universal).  In  the  noiseless  case,  where  Dn,s  —  0  and  Rf,\s  = 
Hs(X^).  Rissanen  [1)  has  shown  that  when  A  C  is  compact 
with  a  non-empty  interior,  and  {Ps}  satisfies  certain  regularity 
conditions,  there  exists  a  universal  code  {q^}  with  per-letter 
rate  redundancy 


jfPs(q-^)  < 


A'logAf  ,  f\ogN\ 
2  N  ^°\  N  ) 


(3) 


for  each  9  €  A.  (Hence  the  code  is  weakly  minimax  universal.) 
Rissanen  goes  on  to  show  that  this  is  also  the  minimum  redun¬ 
dancy  achievable  by  any  universal  code  {q'^},  for  almost  all  9 
(with  respect  to  Lebesgue  measure). 


the  M  reproduction  codewords  {/3*(c)}  are  themselves  encoded 
using  a  fixed  “universal”  vector  quantizer  optimized  for  the  dis¬ 
tribution  of  the  j3'‘(c)s  (averaged  over  all  c,  and  9).  In  the 
second  part  of  the  encoding,  ,Y*, . . .  ,-Y*  are  encoded  using  the 
quantized  code  q*  =  o  q*. 


Let  R  =  logAf/k  be  the  rate  in  bits  per  letter  of  q*,  and 
let  Dk(R)  =  EDi,,s(R)  be  the  average  fcth  order  operational 
distortion-rate  function  Dk,s{R).  We  use  the  high  resolution 
approximation  D*,»(R)  =  and  a  Lagrangian  formula¬ 

tion  to  determine  the  optimal  bit  allocation  between  q*  and  the 
“universal"  quantizer.  For  this  optimal  allocation,  we  show  that 
(with  R.v.s  =  A  in  (1)  and  Dy  s  =  Dy,s{R)  in  (2))  the  per-letter 
rate  and  distortion  redundancies  for  the  overall  code  q"*  are 


A'log;V  f\ogN\ 

J  iV  A  j 


(4) 


i£6e(q”*)  Dk{R)  [l  +  ^]  -  Dy{R),  (5) 

where  A'  =  Mk  is  the  total  number  of  parameters  in  the  code 
q"*.  The  same  results  hold  in  the  case  where  the  codewords 
3(c)  are  scalar  quantized.  The  recent  work  of  Zeger,  Bist,  and 
Linder  [4]  supports  these  results. 


While  our  rate  redundancy  result  (4)  for  universal  source  cod¬ 
ing  with  respect  to  a  fidelity  criterion  is  consistent  with  Rissa- 
nen's  result  (3)  for  universal  noiseless  coding,  our  distortion 
redundiuicy  result  (5)  is  consistent  with  Akajke’s  result  on  the 
expected  decrease  in  log  likelihood  for  empirical  maximum  like¬ 
lihood  on  N  samples,  with  Davisson’s  result  on  the  expected 
increase  in  squared  error  for  empirical  linear  prediction  on  N 
Gaussian  samples,  and  with  Pollard’s  result  that  the  codewords 
in  a  quantizer  follow  a  central  limit  theorem  (which  implies  that 
the  expected  increase  in  squared  error  is  inversely  proportional 
to  the  number  of  samples  N). 
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INFORMATION  BOUNDS  FOR  THE  RISK  OF  BAYESIAN  PREDICTIONS 
AND  THE  REDUNDANCY  OF  UNIVERSAL  CODES 

Andrew  Barron,  Bertrand  Clarke,  and  David  Haussler 
Yale  Univ.,  Univ.  British  Columbia,  and  Univ.  California  at  Santa  Cruz 


ABSTRACT:  Several  diverse  problems  have  solutions  in  terms  of  an 
information-theoretic  quantity  for  which  we  examine  the  asymptotics.  Let 
Yi ,  Y], . . . ,  Yat  be  a  sample  of  random  variables  with  distribution  depending 
on  a  (possibly  infinite-dimensional)  parameter  9.  The  maximum  of  the  mutual 
information  In  =  I(9\  Yi,  Y], . . . ,  Yjv)  over  choices  of  the  prior  distribution  of  9 
provides  a  bound  on  the  cummulative  Bayes  risk  of  prediction  of  the  sequence  of 
random  variables  for  several  choices  of  loss  function.  This  same  quantity  is  the 
minimax  redundancy  of  universal  data  compression  and  the  capacity  of  certain 
channels.  General  bounds  for  this  mutual  information  are  given.  A  special  case 
concerns  the  estimation  of  binary-valued  functions  with  Vapnik-Chervonenkis 
dimension  if„e,  for  which  the  information  is  bounded  by  d,c  log  S .  For  smooth 
families  of  probability  densities  with  a  Euclidean  parameter  of  dimension  d, 
the  information  bound  is  (d/2)  log  N  plus  a  constant.  The  prior  density 
proportional  to  the  square  root  of  the  Fisher  information  determinant  is  the 
unique  continuous  density  that  achieves  a  mutual  information  within  o(l)  of  the 
capacity  for  large  N .  The  Bayesian  procedure  with  this  prior  is  asymptotically 
minimax  for  the  cumulative  relative  entropy  risk. 

SUMMARY:  A  parameterized  family  of  distributions  Pvm|(  is  used 
to  model  a  sequence  of  random  variables  =  ( Yi,  Vj, . . . ,  Vjv).  For  prob¬ 
lems  of  data  compression  and  on-line  prediction  we  compare  the  performance 
that  can  be  achieved  when  9  is  unknown  to  the  performance  that  would  be 
achieved  if  it  were  known.  Entropy  and  probability  of  error,  repectively,  can 
used  to  measure  the  performance.  The  relative  entropy  is  used  to  bound  the 
additional  risk  due  to  lack  of  knowledge  of  the  parameter.  If  9  were  known,  the 
best  on-line  prediction  and  compression  of  the  sequence  of  variables  Yt  would 
be  available  from  the  conditional  distribution  If  d  is  unknown, 

these  actions  may  be  based  on  an  estimate  of  the  conditional  distribution  us¬ 
ing  the  observed  past.  When  a  prior  distribution  is  assigned  to  the  parameter, 
Bayesian  proceedures  use  the  distribution  Fviiyii-i  obtained  by  averaging  out 
the  parameter.  We  examine  the  cumulative  relative  entropy  distance  between 
these  predictive  distibutions.  By  the  chain  rule  this  quantity  reduces  to  the 
relative  entropy  Dn,$  =  D(/V''|«l|Pl"')  between  thejoinl  distributions  of  Y^, 
with  and  without  conditioning  on  9.  In  statistical  terminology,  Ds.t  >s  the  cu¬ 
mulative  risk,  when  relative  entropy  is  used  as  the  loss  function.  Averaging 
with  respect  to  the  prior  distribution  of  9  yields  the  mutual  information  In 
as  the  (cumulative)  Bayesian  risk.  Maximizing  the  Bayes  risk  In  with  respect 
to  the  choice  of  the  prior  for  9,  yields  the  information  capacity  Cn  and  de¬ 
termines  the  sequence  of  Bayes  estimators  of  the  conditional  distribution  that 
are  minimax,  i.e.,  that  minimizes  the  maximum  value  of  Djv,«.  In  situations 
where  determination  of  the  exact  asymptotics  of  In  is  not  possible,  bounds  on 
In  may  be  used  to  provide  bounds  on  the  minitnax  cumulative  risk. 

In  universal  noiseless  coding  of  discrete  random  variables,  the  redun¬ 
dancy  Rnj  of  a  code  is  the  increase  in  the  expected  total  codelength  due  to 
the  lack  of  knowledge  of  the  parameter  value.  For  the  code  based  on  Pyw, 
the  relative  entropy  Dn.)  is  the  redundancy;  the  information  In  is  the  average 
redundancy;  the  information  capacity  Cn  is  the  miiiimax  redundancy;  and 
the  choice  of  the  prior  that  achieves  the  capacity  provides  the  minimax  code 
(Davisson  1973,  Davisson  and  Leon-Garcia  1980). 

In  the  online  prediction  problem,  we  let  the  regret  rN,i  be  defined  as 
the  increase  in  the  expected  frequency  of  mistakes  in  predicting  the  values  of 
the  sequence,  due  to  the  lack  of  knowledge  of  the  parameter  value.  The  regret 
of  the  sequence  of  Bayesian  predictions  is  bounded  by 

r;v,.  <  (2DAr,,/7V)*/» 

Thus  the  regret  converges  to  zero  if  the  relative  entropy  is  of  smaller  order  than 
M.  A  tighter  bound  between  r/v,«  and  Dn.s  is  possible  if  the  sequence  of  condi¬ 
tional  distributions  satisfy  an  a-separation  property,  that  is,  for  some  a  >  0, 
the  difference  between  the  first  and  second  largest  values  of  P(Yt  =  viY*“*,J) 
is  never  less  than  a.  In  this  case,  the  regret  of  the  Bayesian  predictions  is 
shown  to  be  bounded  by  rN,t  <  (2/a)DN,$/H-  Averaging  with  respect  to  the 
prior  yields  Bayes  average  regret 

tn  <  (2/o)//v/N. 

A  basic  role  in  the  analysis  of  the  asymptotics  of  the  mutual  information 
is  played  by  the  relative  entropy  D(P>'N|i||/’yN|«>)  between  the  distributions  at 
neighboring  parameter  points  9  and  9'.  It  is  shown  that  the  mutual  information 
is  bounded  by 

In  <  inf{DAr(n)-l-//(n)} 

where  the  infimum  is  over  partitions  11  of  the  parameter  space.  Here  D/v(n)  is 
the  average  diameter  of  the  cells  of  the  partition  as  measured  by  the  relative 
entropy  distance  and  /f(fl)  is  the  entropy  of  the  discrete  random  variable 


induced  by  the  partition.  This  bound  may  be  used  to  show  that  for  certain 
“nonparametric”  cases  In  is  of  order  N"  with  0  <  p  <  1.  We  also  give  finite 
and  infinite  dimensional  cases  where  In  is  of  order  log  A7.  So  the  price  for  lack 
of  knowledge  of  the  fatamHeT  is  small  compared  to  the  total  entropy. 

In  these  bounds,  we  are  permitted  to  have  a  sequence  of  exogenous 
input  variables  ,Yi ,  A'o. . . . ,  Ay  on  which  the  distributions  are  conditioned. 
For  example  the  Vi  may  equal  a  function  /i(At)  corrupted  by  noise.  Of 
particular  interest  is  the  case  that  the  Vi  variables  are  binary-valued  and  equal 
/*(-Vi)  plus  independent  Bernoulli  (A)  noise  (modulo  2),  where  /«(*)  is  a  given 
family  of  binary-valued  functions  of  Vapnik-Chervonenkis  dimension  d,e,  and 
the  noise  rate  satisfies  0  <  A  <  1/2.  Then  for  any  prior  distribution  on  5, 

f/v  <  d»e  log(e7V/d,e). 

It  follows  that  for  the  on-line  Bayesian  prediction  of  Yj ,  Yj , . . . ,  Yy  the  relative 
frequency  of  errors  has  average  that  exceeds  the  noise  level  A  by  not  more  than 
a  multiple  of  (dye/A)  log(V/doj).  Likewise  for  universal  data  compression,  the 
length  of  the  Shannon  code  based  on  the  Bayesian  model  for  Yi,  Va, . . . ,  Vy, 
divided  by  the  sample  size  N,  has  average  that  exceeds  the  noise  entropy  A(A) 
by  not  more  than  (dtc/N)  log(eA/d,e). 

Refined  results  are  possible  in  the  case  of  smooth  parametric  families 
of  densities  p{y\9)  indexed  by  a  finite-dimensional  parameter  vector  S.  Here 
Yi,  Yi, . . . ,  Yy  are  assumed  to  be  independent  and  identically  distributed  when 
conditioned  on  the  parameter.  An  asymptotic  expression  for  the  mutual  infor¬ 
mation  In  of  the  form  (d/2)  logJV-l-c(p)-Po(l)  has  been  determined  by  Ibragi¬ 
mov  and  Hasminskii  (1973),  in  which  the  constant  c(p)  is  precisely  determined 
as  a  function  of  the  prior  density  p(9).  (Somewhat  stringent  conditions  are 
required  for  their  result;  see  Efroimovich  1980,  Clarke  1989  for  other  formula¬ 
tions  of  conditions).  Here  d  is  the  Euclidean  dimension.  A  related  asymptotic 
expression  for  Dy.s  is  given  in  Clarke  and  Barton  (1990).  This  leads  us  to 
examine  the  asymptotics  of  the  capacity  Cn  snd  the  choices  of  prior  distri¬ 
butions  for  9  that  asymptotically  achieve  this  capacity.  For  each  finite  N  the 
optimizing  prior  distribution  is  generally  discrete  (Berger  and  Bernardo  1989, 
Zhang  and  Hartigan  1992).  Nevertheless,  we  show  under  general  smoothness 
conditions  that  a  unique  continuous  density  p(9)  achieves  a  value  In  within 
o(l)  of  the  capacity  Cy.  As  conjectured  by  Bernardo  (1979),  it  is  Jeffrey’s 
prior,  i.e.,  the  prior  proportional  to  the  square  root  of  the  determinant  of  the 
Fisher  information  matrix.  No  other  prior  (continuous  or  discrete)  achieves 
asymptotically  larger  value  of  the  mutual  information. 

We  give  a  further  asymptotic  decision-theoretic  property  of  the  optimal 
prior.  Jeffrey’s  prior  is  shown  to  be  asymptotically  least  favorable,  that  is, 
the  minimax  statistical  risk  inf^  max#  Dy.s  (which  also  equals  the  capacity 
Cy)  is  achieved  asymptotically  by  the  Bayesian  procedure  with  Jeffrey’s  prior, 
uniquely  among  continuous  priors.  Moreover,  with  this  choice  of  prior,  Dy.s 
is  asymptotically  independent  of  the  parameter  9,  so  that,  in  this  case,  the 
relative  entropy  Dy,  the  mutual  information  Jy,  and  the  capacity  Cy  are 
asymptotically  the  same. 
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There  is  no  Universal  Source  Code  for  Infinite 

Alphabet 

Laszlo  Gyorfi*,  Istvan  Pali*,  and  Edward  C.  van  der  Meulen^ 


The  vast  majority  of  results  in  information 
theory  is  on  situations  where  the  actual  proba¬ 
bility  law  is  known.  Applying  information  the¬ 
ory  in  real  life  problems,  there  is  an  obvious 
question  whether  the  probability  law  can  be 
learned  from  data  as  far  as  information  theory 
is  concerned.  In  noiseless  source  coding,  for  ex¬ 
ample,  if  the  source  alphabet  is  finite,  then  the 
answer  to  this  question  is  yes,  since  there  are 
good  universal  source  coding  procedures  (see 
e.g.  [2]).  This  paper  is  on  coding  for  a  discrete 
infinite  source  alphabet  showing  that  there  is 
no  universal  source  code  over  the  class  of  dis¬ 
crete  memoryless  sources  with  infinite  source 
alphabet  and  finite  entropy. 

Let  be  a  random  variable  taking  values 
in  A'  =  {1,2,3,...}  with  distribution  fi  and 
entropy  H{X)  <  oo.  A  discrete  memoryless 
source  {AT,}  with  the  marginal  distribution  n 
is  considered. 

For  a  discrete  memoryless  source  let  /„  be 
a  variable  length  uniquely  decodable  code  with 
source  block  length  n.  Let  the  average  code¬ 
word  length  of  fn  be  denoted  by  J„.  The  re¬ 
dundancy  per  letter  of  /„  is  defined  by  Rn  = 

i(7„-//(Ari,...,x„)). 

There  is  a  well-known  duality  between  uni¬ 
versal  coding  and  distribution  estimation  con¬ 
sistent  in  information  divergence,  namely,  there 
is  a  universal  source  code  over  a  subset  of  the 
set  of  all  discrete  memoryless  sources  with  fi¬ 
nite  entropy  if  and  only  if  there  is  a  distribution 
estimate  consistent  in  information  divergence 
for  all  sources  within  this  subset.  Concerning 
the  aim  of  this  paper  the  important  direction 
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of  this  equivalence  is  as  follows: 

Theorem  1  For  any  uniquely  decodable  code 
fn  we  can  construct  a  distribution  estimate  fin 
such  that 

Rn  >  E{/(/x,/i„)}. 

Theorem  2  If  {/in}  is  an  arbitrary  sequence 
of  estimates  of  /i  then  there  is  a  /i  with  //(A')  < 
oo  such  that  we  have 

/(/z,/i„)  =  oo  for  all  n  >  1  a.s. 

As  in  Davisson  [1],  a  sequence  of  uniquely 
decodable  codes  /i,/?, . . .  is  called  weakly  uni¬ 
versal  for  a  class  of  sources  if 

lim  Rn  =  0 

n-*oo 

for  all  sources  in  this  class. 

The  following  theorem  implies  that  there  is 
no  universal  code  for  the  class  of  discrete  mem¬ 
oryless  sources  with  finite  entropy. 

Theorem  3  For  any  sequence  of  source  codes 
{fn}  there  is  a  memoryless  source  with  finite 
entropy  such  that 

Rn  =  In  =  oo  for  all  n. 
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SUMMARY 

Let  s  be  a  discrete  m-ary  source  over  alphabet  A,  /„  be  a 
block-  to-variable  encoding  method  for  blocks  of  length  n, 
rn(/n>s)  and  Pn{fn,s)  be  an  ’’average”  redundancy  and 
"maximal”  (over  all  blocks  of  length  n)  ’’individual”  re¬ 
dundancy  of  encoding  of  source  s  with  code  /„  correspond¬ 
ingly.  For  given  set  S  of  sources  s  the  efficiency  of  encoding 
fn  is  estimated  with  r„(/„,5)  =  sup{r„{f„,s),s  6  S)  or 
PnUn ,  S)  =  sup{p„{fn ,  s),  S  G  5} . 

For  all  the  considered  sets  of  unifilar  sources  the  maxi¬ 
mal  probabilities  codes  or  MP-codes  /*  =  /^(5)  [1]  satisfy 
inequality 

r„(/:,5)  <  p„ir„,S)  <  ^[a(5) log n -I- /9(5)]  (1) 

where  a(5)  is  a  number  of  indep''ndent  parameters  of  dis¬ 
tributions  in  5  except  apriori  distributions  (for  sources 
with  memory)  and  I3{S)  is  independent  on  n.  For  most 
cases  this  results  are  asymptotically  optimal. 

For  many  reasons  we  need  to  widen  the  considering  sets 
of  sources.  The  finite-state  m-ary  source  s  is  described 
with  conditional  probabilities  6{a,  u|u  ),  where  a  and  u 
corresponds  to  arbitrary  iiioment  and  u  is  a  preceding 
state.  But  the  set  5(m,  w)  of  all  finite  state  sources  with 
iven  alphabet  A  and  set  U  of  w  states  is  very  large:  it 
escribes  with  o(5(m,  w))  =  (mw  —  Ijui  independent  pa¬ 
rameters  of  conditional  probabilities.  So  we  need  to  define 
and  to  consider  the  reasonable  subsets  of  S(m,  w). 

Switching  source  s  —  (sq,  si, . . . ,  sm)  (see  also  [2])  con¬ 
sists  of  M  subsources  si,. .  .,sju  which  generate  letters  of 
alphabet  A  independently  one  from  another  one,  and  of 
control  source  so,  which  realises  sequential  commutation 
of  subsource’s  outputs  with  output  of  source  s.  After  sub¬ 
source  Si  is  switched  off  it  continue  to  generate  ’’blind” 
letters  during  /•  >  0  steps  and  then  stops  (if  during  this 
li  steps  it  <s  not  switched  on  again).  The  blind  letters  in¬ 
fluent  the  probabilities  of  the  next  letters  at  the  output 
of  Si  but  not  the  output  of  s.  And  the  statistical  proper¬ 
ties  of  subsources  for  "switched  on”  and  "switched  off” 
modes  can  be  different.  Let  so,  si , . . . ,  sm  ^e  chosen  inde¬ 
pendently  from  sets  5o,  5i , . . . ,  Sm  correspondingly.  The 
different  sets  5  =  5o  x  5i  x  ■  ■  ■  x  Sm  were  considered.  And 
the  main  results  are  those. 

Theorem  1.  If  5o,5i, . . .  ,Sm  are  sets  of  finite-state 
sources  then  inequality  (1)  is  true  for  corresponding  set  5 
of  switching  sources,where 

M 

0(5)  =  53  0(50  (2) 

i=0 

is  a  number  of  independent  parameters. 

5(m,  w)  is  just  the  particular  case  with  M  =  mem¬ 
oryless  (and  stable)  components,  and  I.Csiszar  proved  in¬ 
equality  (2)  for  it.  But  in  general  case  5) , . . . ,  Sm  can 
contain  both  stable  and  unstable  components. 

Theorem  2.  The  sequential  universal  encoding  for  set 
5  of  switching  sources  with  finite-state  subsources  and 
finite-state  control  source  let  us  satisfy  (1)  and  (2)  and 
needs  not  more  than 

A  =  0(n^(^)+‘)-«  — oo  (3) 


arithmetic  computations  and  memory  sells  of  fixed  size, 
where  7(,^  is  not  less  than  a(5)  and  not  more  than  general 
number  of  parameters  in  5. 

REFERENCES 

[1]  Yu. M.Shtarkov,  ’’Sequential  Universal  Encoding  of 
Single  Messages”,  Problemy  Peredachi  Informatsii, 
vol.23,  no  3,  1987,  pp.3-17. 

[2]  T.Berger,  Rate- Distortion  Theory.  A  mathematical 
Basis  for  Data  Compression.  New  Jersey,  Prentice- 
Hall,  1971.2 


56 
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Abstract  The  pnoblem  of  encoding  of  the 
sources  with  unknow  statistics  is 
considered.  The  efficiency  of  the  codes  is 
estimated  by  three  characteristics:  i)  the 
redundancy,  defined  as  the  difference 
between  the  average  codeword  length  and  the 
Shannon  entropy;  ii)  memory  size  (in  bits) 
of  the  coder  and  decoder  program  when  it  is 
realized  on  a  computer  (S)  and  iii)  the 
average  time  of  encoding  and  decoding  a 
single  letter  (T).  The  time  is  measured  by 
the  number  of  operation  with  single-bit 


word  u  consists  of  f-  log  P(U)'1+  1  of 
binary  letters  of  the  word  R(U).  This  is  the 
known  Gilbert  -  Moore  alphabetical  code  . 

It  being  deciphered  and  it's  redundancy  is 
equal  2/m.  PAST  method  is  based  on  this 
code.  The  main  prablem  is  to  compute  P(U) 
and  R(U)  rather  rapidly. Let  m  =  ^.  Let's 


define 


n]  =  P(u, ) . n^=  P(u„),  X]=  Q(u, ),...A 

1  1  in  in  I  1  m 

1  =2,..., 5+1  ;  k  =1 , . . . ,m/2^"^ . 


words.  All  of  the  known  methods  may  be  It's  easy  to  see  that 

divided  in  two  classes.  The  Ziv-Lempel's  R(U)  =  /2  ;  P(U)  =  . 

codes  and  their  variants  [1]  fall  under  the  Let’s  consider  the  example. Let  A  =  {a^ .a^}, 

first  class  and  the  arithmetic  code  [2]  with  p(a^ )  =  .11,  pCa^)  =  .01,  t=2,  m=4, 

the  T.ynch-Devisson's  code  [3]  fall  under  the  U=  a^a^a^a^ .  Then  11^  =  .01,  n2=.11,  n2=.11. 


second  one.  The  codes  from  the  firct  class 
need  exponential  memory  size  S  = 

0 ( exp ( 1 /r ) )  for  the  achievement  of  the 


n^=.oi,  n^=:.oi- .ii=.ooii ,  n^=.ii*  .oi=.ooii , 

II^=.0011*  . 0011  =.00001 001 ,  =  A.^=.11, 


redundancy  r,  when  r  turns  to  0.  The  methods 
from  the  second  class  have  small  memory  size 
as  well  as  low  rate  of  encoding; 

S  =  0(1/r)°°"®^  .  T  =  0(1/r  (log(1 ) . 
In  this  report  we  present  the  code,  that 
combines  the  merits  of  both  methods:  the 
memory  size  is  small  and  the  rate  is  high; 

S  =  0(1/r°°”®*),  T  =  0((  log  1/r)^  ).  We 
called  this  method  PAST  code.  We  consider 
the  encoding  of  the  Bernoulli  sourses  only, 
but  it  is  obvious  how  to  carry  the  results 
over  to  the  Markov  sourses. 

?h§_?AST_codej^  We  consider  the  main 
idea  of  the  PAST  code  for  the  case  of 
encoding  of  source  with  known 

probabilities.  Let  A={a,,ajj, _ a^}  is  an 

alphabet  of  a  source,  a""  is  a  set  of  words 
with  the  length  m,  m^l .  Let's  assign 
lexicographic  order  on  A*".  Let  p(a)  is  the 
probability  of  the  letter  acA.  We  suppose 
that  all  p(a),  a«A  have  the  form  of  binary 
fraction  with  t  digits. ( ti[ log  n]).  For 
every  word  U  c  jf"  we'll  detennine  P(U)  = 

p(u, )...  p{u^); 

1  m 

Q(a,a, . . .a, )=0;  Q  (U)  =  ^  P(V)  ; 

V'ru;UcA'" 

R(U)  =  Q(U)  +  P(U)/2  .  The  code  of  the 


Ag  =  \3=.00,  =.11,  a|=.1001,  A^=. 11011011 , 

R(u)=A^+  n^/2  =.110111111 ,  r  -logll^  T  +  1  =  6 
Consequently,  code(  u  )  =110111.  As  is 

obvious  from  this  example,  the  main  part  of 
calculation  is  cairied  on  the  short  woi^. 

So  the  general  time  of  calculation  is  rather 
small.  The  decoding  also  based  on  using  of 
{A^}and  The  complexity  of  decoding  and 

coding  is  similar.  The  universal  PAST 

code  is  based  on  this  algorithm  and  on  the 
author's  method  of  fast  estimation  of 
probabilities  p(a^), — [4]. It  should  be 
noted  that  the  complexity  of  the  code  for 
source  with  known  statistic  is  less: 
T{r)=0( (log(1/r))^log  log{1/r)). 
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Abstract:  This  paper  describes  the  construction  of  a 
universal  code  for  minimizing  L.  D.  Davisson’s  mini¬ 
max  redundancy  in  a  range  where  the  true  model  and 
stochastic  parameters  are  unknown. 

Universal  coding  can  be  generally  described  as  a  com¬ 
pression  method  for  sources  with  an  unkmown  or  in¬ 
completely  specified  probability  distribution  [1].  The 
specific  problem  investigated  here  is  the  development 
of  a  universal  code  that  minimizes  the  redundancy, 
i.e.,  the  difierence  r„{l,0)  between  the  expected  per- 
symbol  length  of  the  codeword  generated  by  the 
code’s  length  function  I  [2],  and  the  per-symbol  en¬ 
tropy  H{0)  of  the  source  0,  for  each  source  in  a  pre¬ 
determined  range  A  (not  for  a  specific  source  0  [1,3,4]), 
where  E»[  ],  n,  and  (,  are  respectively  the  expectation 
on  the  source  0,  the  length  of  the  data  to  be  com¬ 
pressed  x1  =  I1X2 . . .  x„,  and  the  length  function  de¬ 
termining  the  codeword  length.  In  Davisson  [Ij,  cod¬ 
ing  scheme  universality  is  strictly  defined  as  the  prop¬ 
erty  that  the  redundancy  r„{l,0)  converges  to  zero  uni¬ 
formly  over  all  sources  in  the  range  A,  by  taking  n 
sufficiently  large  (strong  universality).  This  property 
requires  redundancy  minimization  for  finite  sequences 
as  well  as  asymptotical  optimality  (weak  universal¬ 
ity),  and  is  assured  by  minimizing  minimax  redundancy 
sup#gA  T„(l,0)  for  each  n  in  the  range  A  [Ij. 

The  primarj'  goal  of  the  present  paper  is  to  determine 
the  length  function  I  which  minimizes  the  minimax  re¬ 
dundancy  sup^gj^  r„{l,0)  in  the  range  A  where  both  the 
model  and  stochastic  parameters  of  each  source  0  in  the 
range  A  are  unknown.  Furthermore,  it  is  assumed  that 
the  model  is  included  in  the  set  of  Markov  models,  with 
the  stochastic  parameters  being  the  probabilities  that 
each  symbol  occurs  based  on  each  state  in  each  model. 

First,  it  is  shown  that  universal  coding  is  reduced  to 
determining  the  weight  function  w{,0)  which  generates 
the  length  function  l(x")  [1]  as 

((x7)  =  -logEtc«I)P(z7|tf)],  (1) 

«€A 

where  P(xi|^)  is  the  probability  that  the  data  to  be 
compressed  is  xj  based  on  the  source  0. 

Secondly,  the  weight  function  for  the  framework  of 
state  decomposition  [5]  with  a  known  model  is  presented 
as' 

w(0)  =  u»b](p*<»’)  =  n  (2) 

*  r(:r )  is  the  gamma  function  of  r. 


where  S(g)  and  a  are  respectively  the  number  of  the 
states  s  =  1,2, ...,5(j)  and  the  number  of  the  sym¬ 
bols,  one  parameter  a  >  0  is  selected  so  that  the 
minimax  r^undancy  max^gA  r„{l,0)  is  minimized,  and 
the  occurrence  probability  of  each  symbols  are  p[q,  s,  jj 
9  =  0, 1, . . . ,  a  —  1  in  the  same  state  s.  This  result  is  an 
extension  of  previous  results  [3]  for  composite  sources 
with  a  known  model.  A  general  form  of  the  weight 
function  with  an  unknown  model  is  then  presented  as 

mW  =  fi(ff)«'[»](p*‘*’).  (3) 

where  53,  h{g)  <1  and  the  function  h  is  seleted  so  that 
the  minimax  redundancy  msiXte\^Ah0)  is  minimized, 
in  order  to  formulate  a  universal  coding  method  when 
the  model  is  unknown. 

Finally,  it  is  shown  that  the  minimax  redundancy 
achieved  with  the  presented  coding  method  (available 
for  sequential,  or  adaptive  coding  [6])  is  upper-bounded 
by  the  minimax  redundancy  achie\-ed  of  J.  Rissanen's 
semi-predictive  coding  method  |7|. 

Topics  for  a  future  study  include  developing  a  method 
to  determine  the  value  o  for  the  weight  function  with 
a  kno^m  model,  and  further  investigation  into  deter¬ 
mining  the  function  h  for  models  which  minimizes  the 
minimax  redundancy. 
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1  Introduction 

A  binary  FSMX  source  (see  [3])  generates  a  sequence  of 

digits  from  {0,1},  whose  statistical  behaviour  can  be  described  using 
a  postfix  set  S.  This  postfix  set  is  a  collection  of  binary  strings  which  is 
proper  and  complete.  We  can  now  define  the  state  function  /(•)  which 
maps  semi-infinite  source  sequences  =  ■  ■  •  “"to  their 

unique  postfix  in  S.  This  s,  —  f(x‘sji)  is  the  state  of  the  source,  hence 
Pr{X,  =  zli'_-i}  =  P(il/(iL-<i)),  for  z  €  {0,1}.  All  P{-\s)  for  s  €  5 
are  probability  distributions  on  {0,1}.  FSMX  sources  with  the  same 
postfix  set  are  ssud  to  have  the  same  model.  In  the  binary  case  we  need 
|5|  free  parameters  to  specify  all  its  distributions  P(-|s),s  €  S.  The 
number  of  free  parameters  of  the  actual  FSMX  source  determines  the 
asymptotic  redundancy  for  an  optimal  code.  Sequential  source  coding 
procedures  for  FSMX  sources  often  use  a  context  tree  (see  fig.l).  The 
standard  approach  (see  [4],  [2],  [3],  and  [5])  is  that  one  uses  this  context 
tree  to  estimate  the  current  context,  i.e.  the  actual  state  of  the  source. 
This  context  is  used  to  estimate  the  distribution  of  the  next  source 
digit.  Arithmetic  coding  procedures  can  then  be  used  to  encode  this 
next  symbol  with  negligible  coding-redundancy. 

However,  instead  of  estimating  the  actual  state  we  should  try 
to  find  a  good  encoding  distribution.  This  first  principle  immedi¬ 
ately  suggests  model  u>eighting.  Model  weighting  increases  the  (block- 
)model-redundancy  by  at  most  -  log(lV(<T))  where  <r  is  the  actual 
model,  and  W(a)  its  weighting  probability.  To  weight  an  infinite 
number  of  models  we  introduce  a  second  principle  which  says  that 
the  model-redundancy  has  to  be  proportional  to  the  number  of  free 
parameters  of  the  model.  It  gives  us  a  weighting  distribution  over  all 
models.  The  next  section  describes  an  efficient  method  that  weights 
the  block-probabilities  of  all  models  according  to  this  distribution. 


2  The  Context  lYee  Weighting  Procedure 


We  assume  that  the  maximal  depth  d  of  the  context  tree  is  finite.  A 
node  in  the  context  tree  corresponding  to  context  s  contains  no(s) 
and  ni(s),  i.e.  the  numbers  of  zeros  and  ones  in  the  source  sequence 
Z1Z2  •  •  •  Z|_i  that  were  preceded  by  s.  We  assign  a  block  probability 


F(no,n,) 


+  a)  •  •••  •  (”o  ~  a)  •  2  (1  +  j)  •  -  -  j)  y,, 

1  •2-...  (no -bn,)  '  ^ 


to  a  (sub')seqnence  with  no  >  0  zeros  and  ni  >  0  ones,  etc.  This 
estimator  guarantees  uniform  convergence  of  the  redundancy  (see  [I]). 
Using  the  context  tree  we  can  now  recursively  define  the  weighted 
probability  corresponding  to  a  given  context  as 

^(”o(s),Hi(s))/2-f-<2(0s)  Q(ls)/2  if  depth(s)  <  d 
^  P(no(s),ni(8))  if  depth(s)  =  d. 

(2) 

If  we  apply  this  method  to  the  context  tree  in  figure  1  we  obtain 
q(A)  =  0(A)  corresponding  to  the  sequence  ZiZj  ■■ -z,.!  is  our 

weighted  block  probability  Fr{zi,Z2, •  ■  •  ,Zilzijo}'  To  process  the 
next  Zi  we  first  increment  no  for  contexts  A,Zi_i,Z(_3Zi_i,- ■  • ,  and 
z,_,,z,_j+,  ■  ••z,_,.  Then  we  update  Q{X),Q{x,.i),Q(x,.jx,.,),-  -, 
and  Q(x,,sx,.j.^i  •••Zi_i)  in  reverse  order.  Q(A)  is  now  equal  to 
Pr{x\~\Xi  =  0|z®„,}.  Analogously,  by  incrementing  nj,  we  can  find 


Pr{x\~',X,  =  l|z°„}.  Division  yields  the  distribution  for  z,.  The 
number  of  operations  needed  is  proportional  to  the  maximal  depth  d. 


3  Upper  Bound  on  the  Redundancy 

Suppose  we  know  the  model  5  of  the  FSMX  source  but  not  the  values 
of  its  free  parameters.  Then  we  can  use  P(A)  :=  Yl,es  ^(’*o(*)i"i(<)) 
as  block  estimator.  This  gives  us  an  optimal  (see  [2])  parameter- 
redundancy  which  can  be  upperbounded  by  |5|(j  log(  j^)-l- J  logfs^e))- 
Inspection  of  the  procedure  in  the  previous  section  shows  that  the 
weighted  block  probability  Q(A)  >  •  P(A).  Therefore  the 

model-redundancy  of  our  procedure  is  at  most  (2|,S|  —  l)log2. 

Note  that  the  above  holds  for  individual  redundancy  relative  to 
the  actual  source,  but  also  for  individual  redundancies  corresponding 
to  any  other  FSMX  source.  Therefore  for  each  source  sequence,  the 
context-tree  weighting  algorithm  produces  at  most  2|5|  -  1  codebits 
more  than  an  estimator  matched  to  model  S,  for  any  S. 

The  Eindbovens  Hogeschoolfonds  supported  the  second  author  when 
he  visited  Eindhoven’s  Information  Theory  Group.  Thanks. 


Figure  1:  Context  tree  for  z,,Zi,---,Z7  =  0110100,  d  =  4,  and 
•••,z_i,Zo  =  ■  --OOIO.  Non-trivial  Q-values  are  listed. 
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A  spherical  code  is  a  collection  of  points  on  the  surface  of  a  sphere 
in  (Mimensional  Eudidean  space.  A  spherical  t-design  is  a  spherical 

code  consisting  of  N  points  X] . Xu  such  that  the  integral  of  any 

polynomial  of  degree  at  most  t  over  the  surface  of  the  sphere  is  equal  to 
its  average  value  at  these  N  points.  Given  d  and  t,  one  wishes  to 
minimize  the  value  of  N. 

We  have  made  considerable  progress  recently  on  the  case  r  =  4  (and 
informally  think  we  have  cmnpletely  solved  the  problem).  We  will  pve 
very  strong  minrerical  evidence  that  sffoerical  4-designs  containing  N 
poitts  in  d-dimensional  space  with  dSS  exist  precisely  for  the  following 


values  of  N  and  d:  N  even  and  2  2  for  d  =  1;  N  2  S  for  d  =  2; 
N  =  12,14.  =  16  for  d  =  3;  N  S  20  for  d  =  4;  N  2  29  for  d  =  5; 
N  =  27.36.  S  39  for  d  =  6;  N  S  53  for  d  =  7;  and  N  S  69  for  d  =  8. 
These  spherical  codes  also  provide  optimal  (and  rotatable)  experimental 
designs  for  quadratic  modelling  in  the  ball. 

The  foil  text  may  be  found  in  our  paper  "New  Spherical  4-Designs", 
Discrete  Math.,  Vol.  106/107, 1992,  pp.  255-264.  Details  of  the  methods 
used  and  the  application  to  optimal  experimental  design  are  in  our  paper 
"A  New  Approach  to  the  Construction  of  Optimal  Designs",  J.  StaL 
Planning  and  Inference,  1993,  in  press. 
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Ai$tracUA  new  efficient  algorithm  for  bounded-distance  decoding 
of  the  Leech  lattice  is  presented.  The  algorithm  decodes  correctly  at 
least  up  to  the  guaranteed  error-correction  radius  of  the  Leech  lat¬ 
tice.  The  proposed  decoder  is  based  on  projecting  the  points  of  the 
Leech  lattice  onto  the  codewords  of  the  (6,3,4)  quarternary  code,  — 
the  hexacode  Hg.  Projection  on  the  hexacode  induces  partition  of  the 
Leech  lattice  into  four  cosets  of  Qn,  beyond  the  conventional  parti¬ 
tion  into  two  /f]4  cosets.  This  enables  bounded-distance  decoding  of 
the  Leech  lattice  with  only  1127  real  operations  in  the  worst  case,  as 
compared  to  about  3600  operations  for  the  maximum-lilcelihood  de¬ 
coding  of  [9].  The  proposed  algorithm  is  at  least  30%  more  efficient 
than  Forney’s  algorithm  (5]  in  terms  of  computational  complexity, 
while  the  coding  gain  loss  is  no  more  than  0.05  dB  (over  BER  rang¬ 
ing  from  10"'  to  10”®  ). 

The  Ltech  latiice  A24  is  one  of  the  most  interesting  and  well  studied  lattices 

[3].  Maximum-likelihood  decoding  of  A]4  was  intensively  investigated  during 
the  last  few  years.  Conway  and  Sloane  [2],  Forney  [4],  Lang  and  Longstaff  [6], 
Be’ery,  Shahar,  and  Snyders  [1],  and  Vardy  and  Be’ery  [9]  have  devised  various 
decoding  algorithms  with  complexities  ranging  bom  55968  down  to  3595  opera¬ 
tions.  While  the  problem  of  meucimum-likelihood  decoding  of  A]^  is  interesting 
in  its  own  right,  in  practice  it  may  be  rewarding  to  use  a  slightly  suboptimal 
but  more  efficient  bounded-distance  decoding  algorithm.  One  such  algorithm 
was  developed  by  Forney  [5].  The  computational  complexity  of  the  ori^nal 
Forney’s  algorithm  is  somewhat  less  than  2000  operations.  However,  since 
Forney’s  decoder  is  based  on  soft-decision  decoding  of  the  Gotay  code,  utilis¬ 
ing  the  Golay  decoder  of  [8]  in  Forney’s  bounded-distance  algorithm  yields  a 
computational  complexity  of  less  than  1500  operations.  In  this  paper  we  pro¬ 
pose  a  more  efficient  bounded-distance  decoding  algorithm  which  requires  only 
1127  operations.  The  proposed  algorithm  is  shown  to  decode  correctly  at  least 
up  to  the  guaranteed  error-correction  radius  of  the  Leech  lattice.  Simulation 
results,  which  compare  the  coding  gain  obtained  using  the  new  algorithm  with 
the  coding  gain  of  the  Forney’s  algorithm,  are  also  provided. 

Our  construction  of  the  Leech-lattice  involves  the  two-dimensional  checker¬ 
board  lattice  Dj.  Partition  Dj  into  16  subsets  and  arrange  the  labels  of  the 
16  subsets  in  the  following  configuration: 


34ooo 

Booo 

44iio 

Suo 

Hioi 

“^lOl 

Boio 

-^lOl 

4^111 

Shi 

Afloi 

Sooi 

Hon 

-^loo 

Sioo 

Aoii 

Tiling  the  entire  space  with  nonoverlapping  copies  of  scaled  and  rotated  version 
of  this  16-point  configuration  establishes  a  correspondence  between  the  labels 
of  the  16  subsets  and  the  points  of  Dj.  Let  us  represent  the  points  of  A24  by 
2x6  arrays  of  D2  points,  such  as  : 

-^‘3j3*3  •••  Al3j,3t,3  J 

The  array  in  (1)  is  called  type-A  since  it  contains  only  Aijt  points.  Similarly, 
type-B  array  will  consist  of  only  Bijt  points.  Let  ,  Ai^,t,)'  be  a  column 

of  a  type-A  array  and  let  =  (>i,  ji,>2,  jr)  Ihe  corresponding  binary  4- 
tuple.  If  contains  an  even  number  of  nonzeros  then  the  column  is  said 
to  be  even,  otherwise  the  column  is  said  to  be  odd.  The  index  I'l  is  called 
the  li-parity  of  the  column.  The  overall  h-parity  of  the  array  is  defined  as 
the  modulo-2  sum  of  the  A-parities  of  the  six  columns.  The  overall  k-paritf 
of  the  array  is  the  modulo-2  sum  of  the  k  subscripts  of  the  12  points.  As 
in  [8]  any  binary  4-tuple  ji  is  regarded  as  an  interpretation  of  a  character 
*€{0,l,w,w)  =  GF(4).  Conversely  any  z€(0,  l,u>,w}  may  be  regarded  as  the 
projection  of  four  different  binary  4-tuples  y,  such  that  z  =  (0,  l,ui,u))  ']l.  The 
projection  of  the  arrap  is  a  vector  £€GF(4)®  consisting  of  the  projections  of 
the  six  columns.  Using  the  above  notation  the  Leech  lattice  may  be  defined  as 
follows  [9]: 

Deflnitioa  1.  The  Leech  lattice  is  the  set  of  all  the  2x6  arrays  whose  entries 
are  points  of  Dj,  such  that  each  array  satisfies  the  following  conditions: 
i.  ft  is  either  type-A  or  type-B. 

U.  It  consists  either  of  only  even  columns  or  only  odd  columns. 

Ui.  The  overall  k-parity  is  even  if  the  array  is  type-A,  and  odd  otherwise. 
iv.  The  overall  h-parity  is  even  if  the  columns  are  even,  and  odd  otherwise. 

V,  The  projection  of  the  array  is  a  codeword  of  //« . 
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Note  that  by  restricting  condition  i  of  the  foregoing  definition,  that  is  tak¬ 
ing  only  the  type-A  arrays,  the  Leech  half-lattice  Hja  is  obtained.  Further 
restricting  condition  U,  that  is  taking  only  even  columns,  produces  the  Leech 
gnarter-lattice  Q^s,  as  defined  in  [9].  The  proposed  decoding  algorithm  con¬ 
sists  of  four  separate  Q24  decoders  operating  concurrently.  Basically  the  four 
deciders  are  identical.  We  therefore  describe  only  the  decoder  for  (J24.  This 
decoder  operates  on  type-A  arrays  consisting  of  only  even  columns. 

PrecomputationiLet  us  assume  an  AWGN  channel  model,  and  let  a  se¬ 
quence  of  12  two-dimensional  symbols  {i'(n)}J^L|  be  observed  at  the  channel 
output.  For  n  =  1,2,...  12  find  in  each  A^/t-subset  of  Dj  a  point  A,j4(n) 
which  is  closest  to  r(n),  and  set  this  point  as  a  representative  of  the  entire 
subset. 

Computation:For  each  z  £  GF(4)  and  for  each  of  the  six  array  locations, 
the  decoder  first  finds  the  preferable  representation,  which  is  the  column  with 
the  minimum  squared  Euclidean  distance  (SED)  from  the  appropriate  pair 
of  received  symbols.  This  SED  is  taken  to  be  the  metric  of  z.  Using  the 
acquired  information  the  decoder  finds  the  codeword  of  He  with  the  minimum 
metric.  A  type-A  array  with  even  columns  is  then  reconstructed  from  this 
codeword  of  He.  We  show  that  this  array  is  the  closest  to  the  received  sequence 
of  symbols.  Next  conditions  iv  and  iii  are  checked,  in  this  order.  If  either 
of  these  conditions  is  violated,  correction  is  performed  for  condition  iv  and 
independently  for  condition  iii  using  the  “Wagner  decoding  rule”  of  [7]. 

The  output  of  the  Q24  decoder  is  a  Leech  quarter-lattice  point  accompanied 
by  a  corresponding  metric.  This  point  is  not  necessarily  the  closest  to  the 
received  sequence  of  symbols  due  to  the  independent  Wagner  decoding.  Finally 
we  choose  among  the  outputs  of  the  four  Qn  decoders  the  point  with  the 
minimal  metric,  and  select  this  point  as  the  output  of  our  Leech  lattice  decoder. 

Now  let  do  be  the  minimum  distance  between  points  in  the  checkerboard 
lattice  Di.  The  corresponding  minimum  SED  between  points  in  A24  is  given 
by  -  16do.  We  have  the  following  theorem. 

Theorem  1.  Given  a  received  vector  of  12  two-dimensional  symbols  r  = 
{K’')}itsi>  ^  there  is  a  point  A  €  A24  such  that  ||r  -  A||’  <  ddj,  the  proposed 
algorithm  decodes  r  to  A. 

Theorem  1  implies  that  the  proposed  algorithm  decodes  correctly  at  least 
up  to  the  guaranteed  error  correction  radius  of  the  Leech  lattice  d„,n/2  =  ido. 
This  correction  capability  is  the  same  as  that  of  the  bounded-distance  decoder 
of  Forney  [5].  A  comprehensive  computer  simulation  has  been  performed  for 
both  the  proposed  algorithm  and  the  algorithm  of  Forney  [5].  The  simulation 
assumed  a  64-QAM  square  constellation  transmitted  over  an  AWGN  channel. 
Results  show  no  more  than  0.05  dB  lose  in  the  coding  gain  for  our  algorithm 
versus  that  of  Forney,  over  the  whole  range  of  BER  from  10”  ‘  to  10”*.  This 
gain  loss  is  due  to  an  increase  in  the  effective  error  coefficient,  or  the  number 
of  nearest  neighbors,  for  the  proposed  algorithm.  This  issue  will  be  fruther 
elaborated  in  the  paper. 
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SUMMAKY 

Coded  modulation,  which  is  an  efficient  way  of 
combining  error  correction  coding  with  modulation,  is 
considered  here  for  the  case  of  M-ary  Phase  Shift  Keying 
(M-PSK).  In  this  paper  we  are  interestM  in  the  probability  m 
decoding  error  for  an  additive  white  Gaussian  noise  (AWGN) 
channel,  in  which  the  well  known  union  bound  is  useful  only 
when  the  desired  probability  of  error  is  rathm  small.  However, 
the  coded  modulation  structure  can  be  implemented  as  an 
inner  code  concatenate!  with  a  Reed  -  Solomon  outer  code  [1]. 
Therefore,  for  a  low  signal  to  noise  ratio,  a  tighter  uppa 
bound  on  the  probalnlity  of  decoding  error  must  be  derived. 

The  tangential  union  bound,  which  is  tighter  than  the 
union  bound,  is  presented  by  Berlekamp  in  j2]  for  binary 
codes.  Since  each  transmitted  codeword  erf  an  M-PSK  coded 
modulation  structure  has  the  same  energy,  a  tangmitial  union 
bound  can  be  derived  for  this  structure  as  well.  On  the  other 
hand,  a  sphere  upper  bound,  which  is  derived  in  [3]  for  any 
block  coded  mediation  schme  and  is  also  tighter  than  the 
union  bound,  is  appUcable  also  for  the  M— PSK  constellation. 
However,  in  the  derivation  of  this  bound,  the  important  foct 
that  each  transmitted  codeword  for  M-PSK  constdlation  has 
the  same  energy,  was  not  taken  into  account.  In  our  paper  an 
upper  bound,  named  tangential  sphere  bound  (for  the  case  of 
binvy  codes  see  [4]),  is  derived.  It  is  also  proven  that  our 
bound  is  tighter  tnen  Berlekamp’s  tangential  bound.  In 
Example  1  of  this  summary,  it  is  shown  that  for  a  p^icular 
scheme,  which  is  practically  important,  our  bound  is  much 
tighter  than  the  tangential  bound  for  high  levds  of  noise. 

Consider  a  code  with  fixed  length  n,  M  codewords,  a  set 
of  Euclidean  distances  {Hj}  (j=l,2,-  •  •,N)  and  a  set  of  average 

coefficients  {Aj}  (j=l,2,- •  •,N),  where  A.  is  the  average 

number  of  pairs  of  codewords  with  the  Euwdean  distance  Sj. 

Let  the  additive  white  Gaussian  noise  at  the  input  to  the 
decoder  be  a  2n-dimensional  vector  of  a  random  variable 
denoted  by  i  =  (z^,  Zj-  ■  •,  Z2q),  and  let  the  event  of  error  at 

the  output  of  the  decoder  be  denoted  by  E.  The  probability  of 
decoding  enor,  P^,  can  be  written  as  follows; 

Pg  =  Pr(E/i€Cn(tf))Pr(i€Cn(tf)) 

+  Pr(B/rfCn(tf))Pr(rfCn(tf)),  (1) 

where  Cn(l|)  is  a  2n-dimensional  cone  with  half  angle  0  and  a 
center  at  the  point  of  0  eaergr.  Clearly,  Pr(E/^Cn(0))  <  1. 
Assuming  that  the  energy  m  each  signal  in  the  M-PSK 
constellation  equds  one,  from  (1)  the  Mowing  tight 
tangential  sphere  upper  bound  is  derived 


+  [  l(y)  dy}  dZj, 

Jr* 


(2) 


where  f(y)  and  f(y^)  are  the  chi-square  density  functioru,  f(sj) 
and  f(z2)  are  the  normal  density  functions,  r  is  a  real  positive 
parameter  (a  radius  of  an  2n-l  dimensional  sphere),  r^  = 

-  »i).  -  »i)  Md  o  = 

r^l  -  y4n  . 

It  is  also  proven  in  the  paper  that  the  optimal  value  of 
r,  denoted  by  ro,  is  the  root  of  the  following  equation. 

y  A,  ['iia^(p)ilp»l  (3) 


where  oos(y=y2ro  and  r(-)  is  the  Gamma  function. 


Example  1:  Let  the  structure  be  a  multUevd  code,  based 
on  the  Mowing  component  codes;  at  the  first  partition  levd 
the  binary  Gday(24,12,8)  code  is  employed,  at  the  second 
level  of  partition  we  have  the  single  parity  check  (24,23,2) 
code,  and  the  remaining  Ints  are  nncoded  (these  codes  are  also 
employed  by  the  well  Imown  Leech  half  lattice).  The  resulting 
tangential  sphere  bound,  tangential  bound  and  union  bound 
are  presented  in  Fig.  1.  From  Fig.  1  we  deduce  that  for  the 
structure  of  Example  1  our  upper  lx>und  is  much  tighter  than 
the  union  and  the  tangential  tounds  for  high  levels  cn  noise. 


Fig.  1.  Upper  bounds  on  the  probability  of  decoding  error  for 
the  structure  of  Fhcample  1. 
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A  multistage  decoding  algoritlim  Is  given  for  lattices 
obtained  from  a  generalized  code  formula.  The  corresponding 
multilevel  construction  Is  based  on  a  chain  of  two-way 
lattice  partitions  and  a  family  of  binary  linear  block 
codes,  whereas  the  multistage  algorithm  Involves  the  use  of 
a  maximum-llkellhood  (ML)  decoding  algorithm  for  each  two- 
way  lattice  partitions  as  well  as  a  soft-decision  ML 
decoding  algorithm  for  each  binary  linear  block  code.  The 
algorithm  Is  shown  to  have  the  same  effective  error- 
correcting  radius  as  ML  decoding.  Several  known  lattices 
and  two  new  ones  were  then  constructed  by  the  generalized 
code  formula.  The  trade-off  between  complexity  reduction 
and  performance  loss  achieved  by  the  proposed  algorithm  Is 
presented. 


GENERALIZED  MULTISTAGE  DECODING  FOR  THE  GBCF 

Given  a  lattice  L  expressed  by  the  GBCF,  the  following 
algorithm  can  be  employed. 

Algorithm:  Given  r  c  R*'"  : 

0)  Set  r^«r  and  decode  r^  to  the  closest  point  c^n  a^- 

In  the  lattice  A.  «  C  •  ir  ♦  r" 

0  0  .1 

1)  Set  r.  •  r-  c  •  a  and  decode  r  to  the  closest  point 
^  1  0  0  0  1 


INTRODUCTION 


The  use  of  multidimensional  lattices  In  block  or  trel¬ 

lis  codes  for  'band  limited  channels  has  focused  the  atten¬ 
tion  of  many  researchers  on  the  problem  of  complexity 

reduction  In  lattice  decoding.  For  lattices  expressible  In 

terms  of  code  formulas  based  on  the  chain  of  lattice  parti¬ 
tions  Z/2Z/4Z/...,  Forney  (1]  has  proposed  a  suboptimum 

algorithm  with  a  better  trade-off  between  complexity  reduc¬ 
tion  and  performance  loss  when  the  number  of  levels  In  the 
code  formula  Is  greater  than  one.  However  the  decoding  of 
lattices  with  single-level  code  formulas  like  Hu,  X24  and 
X32  can  nut  benefit  directly  from  this  algorithm.  The 
present  work  extends  the  previous  approach  by  generalizing 
Its  multistage  algorithm  to  more  general  code  formulas. 


Cj  •  ttj  ♦  Tj  In  the  lattice  A^  » 


b-1)  Set  r^  ,  «  r  ,  -  c^  .  •  ,  and  decode  r  to  the 

b-l  b-l  b-2  b-2  b-1 

closest  point  c^  ,  *  «..  .  '*'7..  in  the  lattice  A^  ,  > 
b“*  b-1  b  b“l 

‘^b-i  •  •'•b-Z'-b’  * 


The  received  point  r  Is  decoded  as  the  lattice  point 
^=c  »  ,*7,.  of  the  lattice  Ir/T  1  ♦...♦ 

O  0  b-1  b"I  b  0  0  1 


GENERALIZED  BINARY  CODE  FORMULAS  (GBCF) 

The  lattice  construction  used  In  the  generallzea 
multistage  decoding  algorithm  Is  based  on  a  chain  of  two- 

way  n-dlmenslonal  lattice  partitions  VJTJ..JT  /T  with 

0  1  b-1  b 

selected  sets  of  coset  representatives  and  a  family  of 

binary  linear  block  codes  with  length  m,  dimension  k^  and 

minimum  Hamming  distance  0  s  |  s  b-1. 

Let  r/A  be  an  elementary  binary  partition  of  order  2*. 
Let  (b^,  b^ . "b-l^  be  a  set  of  vectors  forming  a  binary 

basis  to  the  partition  T/A,  I.e.,  be  a  system  of  coset 

representatives  of  A  In  T/A. 


Then  the  following  statements  are  proved:  1)  Tne 
generalized  multistage  decoding  algorithm  Is  Invariant  by 

translation;  2)  For  each  I  cL,  any  poln,  r  c  R*'*  such  that 

Is  decod,. I  to  f  by  the  generalized 
multistage  decoding  algorithm;  3)  For  nested  linear  binary 
block  codes,  C  S..SC  ,  an  exact  expression  for  the 
effective  error  coefficient  Is  given;  4)  Given  r^  c  R"’*, 
the  codewords  c^  c  C  In  step  I  of  the  generalized  multi¬ 
stage  decoding  algorithm  Is  the  one  which  minimizes: 


DetlniUen  1  (GBCF)  ;  Let  ryr./.../r.  ./F.  be  a  lattice 

■■  0  1  b-l  b 

partition  chain,  and  C„,C . ,  be  linear  binary  block 

0  1  b-1 

codes  wlih  blocklength  m,  then  a  periodic  array  L  can  be 
defined  as  follows: 

“  {%•  “o  *  Vi*  Vi  =  vs- 

Let  us  denote  the  Generalize  Binary  Code  Formula  by 

L  "  fryr ,1  ♦  c*  [r/r j4...+  c^ ,•  ♦  r" 

0  0  1  1  12  b-1  b-1  b  b 

Then  the  following  statements  are  proved:  I)  L  Is  a 
lattice;  2)  exact  expressions  for  the  minimum  distance  and 
the  error  coefficient  of  L. 

We  provide  some  examples  of  lattices  L  that  can  be 
obtained  by  the  GBCF  as  well  as  new  lattices. 


j-i  ‘ 


.  (0 
where 


We  show  that  the  number  of  computations  necessary  to 
decode  A^,  0  s  I  a  b-1  is 

N„(A )  -  m  .  rN„(r.A'.  ,)  ♦  ll  ♦  N„(C) 

D  I  [  D  I  Ul  J  D  I 

and  that  the  overall  complexity  of  the  generalized  multi¬ 
stage  decoding  algorithm  can  then  be  estimated  by 

A 

Np(L)  -  I  Njj(A|)  ♦  b.m.n 
■  •0 

Finally,  we  determi;  the  performance  and  the  complexity 
of  the  proposed  algorithm  for  several  lattices  L. 
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Introduction 

Recently  various  schemes  of  coding  and  modulation  have  been  proposed  as 
efficient  methods  to  improve  the  performance  of  digital  communication  sys¬ 
tems.  One  of  inter«ting  appro :ich  was  presented  by  Imai  and  Hirakawa 
in  the  early  stage  of  researches  on  the  coded  modulation.  Their  scheme 
(IH-scheme)  utilize  component  codes  having  different  error  protection  ca¬ 
pabilities  are  employed  with  step  wise  decoding  or  multistage  decoding.  It 
is  noted  that  the  component  codes  are  designed  on  the  basis  of  Hamming 
distance,  but  nonuniformity  is  introduced  by  letting  their  respective  error 
protection  capabilities  different. 

We  propose  a  multilevel  coded  modulation  scheme  using  an  unequal  error 
protection  code.  The  basis  of  the  scheme  can  be  considered  as  an  extension 
of  the  IH-scheme.  Insterul  of  using  several  error-correcting  codes  in  the  IH- 
scheme,  we  use  a  block  or  trellis  code  which  has  unequal  error  protection 
capability. 

To  obtain  large  coding  gain  from  the  UEP  code,  codeword  is  mapped 
into  channel  symbols  by  using  finite  memories  in  the  scheme.  Figire  1  shows 
the  encoding  and  decoding  block  diagram  of  3-level  coding  (i.e.  8-PSK  etc.), 
in  the  figure,  each  of  ‘I  ’is  memory  unit,  that  is  shift  register  of  length  n/3, 
where  n  is  the  code  length  of  block  code  C.  BCM  coding  based  on  the  san>e 
structure  for  the  2-levels  has  been  proposed  by  M.  C.  Lin[l].  he  calls  it  block 
coded  modulation  with  inter  block  memory.  We  studied  the  structure  from 
the  view  point  of  application  of  unequal  error  protection  code. 

We  describe  the  minimum  squared  Euclidian  distances  of  UEP-BCM 
and  UEP-TCM  by  using  the  seporslioas;  that  are  the  measurements  of  error 
protection  capability  of  an  unequal  error  protection(UEP)  code.  Our  scheme 
can  be  considered  as  a  generalization  of  his  scheme. 

Although  the  error  performance  of  UEP-BCM  is  described  from  the  view¬ 
point  of  UEP,  ordinary  error  correction  codes  having  uniform  error  protec¬ 
tion  capability  can  be  applied  to  UEP-BCM.  UEP-BCM  provides  attractive 
coded  scheme  when  the  scheme  is  easy  implemented  by  using  algebraic  de¬ 
coding.  This  study  deals  with  UEP-BCM  based  on  RS  code  wd  BCH  code. 

UEP-BCM 

The  unequal  error  protection  capability  of  an  UEP  code  over  Hamming 
space  is  described  by  using  ‘separations  ’.  Let  C  be  a  linear  (n,  k)  block 
code  over  a  finite  field,  where  n  is  the  code  length  and  k  is  the  number  of 
information  symbols.  If  we  define  separation'  Si  of  UEP  code  as 

Si  =  ^^min^^(W*(v))  {.  =  0,1, ■■■,..-1),  (1) 

where  Vi  is  the  i-tb  symbol  in  codeword  v,  and  VVs(z)  is  Hamming  weight  of 
X.  It  is  easily  proved  that  the  squared  minimum  Euclidian  distance  of  UEP- 
BCM  based  on  (n,  k)  block  code  C  is  bounded  by  the  following  inequality: 

DId  >  min{(So  •  A?,),(S.  •  A?,),  ■  ^(Sn.,  ■  A?...)},  (2) 

Ai  is  the  subset  distance  which  is  given  by  the  set  partitioning  method. 

Define  Sgp(i)  as 

n—  1 

S|o(.)=  min  { 53  «(«>,.  0)A?^}  (3) 

v^C,v,^0  ' 


We  call  S%p{i)  a  ofuared  Euclidian  aeparmtion.  Then  the  squared  minimum 
Euclidian  distance  of  the  UEP-BCM  is  given  by 

Dgp  =  min{(S|o(0)), (51^(1)),  ■■.(Sepin  -  1))}  (4) 

and  that  D^p  is  not  smaller  than  the  bound  pven  by  3. 

Pneformanm  of  UEP-BCM 

Lots  of  BCM  schemes  use  Viterbi  decoding  have  been  proposed.  How¬ 
ever,  one  of  the  advantages  of  block  coding  is  in  algebraic  decoding.  This 
viewpoint  is  the  same  as  the  original  IH-scheme. 

New  soft  decision  decoding  algorithm  for  proposed  BCM  are  studied. 
The  algorithm  can  perfomned  by  ordinary  erasures  and  errors  decoding  after 
the  erasure  locations  are  determined  by  soft  output  of  demodulator  and 
decoding  results  of  former  blocks. 

The  decoding  is  repeatedly  performed  the  assures  and  errors  decoding 
with  changing  the  erasure  locations.  We  can  obtain  better  performance.  The 
repeated  case  can  be  said  generalized  minimum  distance  decoding  algorithm 
for  UEP-BCM. 

Figure  2.  demonstrates  the  performance  of  proposed  coded  scheme  and 
decoding  algorithm  obtained  by  nunrerical  calculation.  We  evaluated  the 
block  error  probabilities  of  (255, 170, 86)  RS  coded  8-PSK  and  (255, 171, 23) 
BCH  coded  8-PSK,  where  the  RS  code  is  defined  over  GF(256).  In  the 
figure,  these  are  indicated  by  RS  and  BCH  respectively.  The  evaluation  are 
obtained  ordinary  hard  decision  decoding  as  wdl  as  proposed  soft  decision 
decoding.  Those  are  indicated  by  ’h’  and ’s*.  For  comparisoo,  we  show 
event  error  provability  of  Ungerboeck’s  trdlis  coded  8-PSK  (v  =  9)  and 
uncoded  bit  error  rate  of  QPSK. 

The  performances  are  compared  with  different  length  block  size.  The  RS 
code  shows  extremely  good  performance,  even  it  has  large  codelength.  The 
erasures  and  errors  decoding  of  distance  86  RS  code  may  not  be  so  simple, 
and  number  of  traiuunission  data  more  than  IKbits/block  is  not  suitable  for 
applications  required  small  delay  (i.e.  voice).  However  the  result  shows  a 
solution  of  mass  data  transmission  with  ultimate  reliability. 
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Figure  2:  Error  Performance  of  Coded  8-PSK 


Correction  and  Interpretation  of  de  Buda’s  Theorem 


Tainfe  Linder 

Deparlmeut  of  'i'elecouiuuiiucatioiis 
Technical  University  of  Budapest 
1521  StoczeK  u.  2,  Budapest,  Hungary 


Christian  Schlegel 
Digital  Communications  Group 
University  of  South  Australia 
The  Levels,  SA  5095,  Australia 


Kenneth  Zeger 

Coordinated  Science  Laboratory 
University  of  Illinois 
1308  W.  Main  St.,  Urbana,  IL  61801 


ABSTRACT 

De  Buda’s  theorem  states  that,  for  asymptotically  large  num¬ 
bers  of  dimensions,  there  exist  "structured”  codes  which  are  opi- 
timal  for  the  AVVGN  channel.  First,  we  point  out  an  error  in  de 
Buda’s  proof  and  then  we  correct  the  proof  using  a  slightly  dif¬ 
ferent  approach.  The  original  erroneous  proof  uses  thick  shells 
of  sphere  bounded  lattices  for  its  optimal  codes  whereas  we  use 
thin  annulus  lattice  codes  for  the  corrected  proof.  We  discuss  the 
algorithiiiic  structure  of  these  codes  as  well  as  the  implications 
obtained  through  a  coding-shaping  gain  argument. 

SUMMARY 

We  correct,  clarify,  and  interpret  a  recent  paper  [1]  by  R.  de 
Buda,  in  which  he  states  that  there  exist  lattice  based  channel 
codes  which  meet  Shannon's  bound  for  optimal  codes  [2].  Unfor¬ 
tunately.  there  appears  to  be  aii  error  with  the  clever  proof  pre¬ 
sented  by  de  Buda.  Here,  we  carefully  examine  de  Buda’s  proof 
and  discuss  the  problems.  We  show  that  de  Buda’s  proof  can  be 
mended,  but  the  resulting  optimal  lattice  code  is  degenerate  in 
the  .sense  that  its  "structure”  appears  to  be  lost.  More  precisely, 
the  result  in  [l]  is  valid  oidy  for  lattice  codes  whose  code  points 
lie  within  a  thin  spherical  shell.  Such  a  code  resembles  more  a 
ramlom  spherical  code  than  a  lattice  code. 

Shannon  in  [2]  developed  tight  upper  and  lower  bounds  on 
the  error  probability  of  optimal  coiles  for  the  AWGN  channel. 
His  randoiti  coding  argument  used  n-dimensional  codes  whose  ,V/„ 
codewords  are  drawn  from  a  uniform  distribution  on  the  surface 
of  a  sphere  of  radius  \/ifS  centered  at  the  origin.  Such  codes  have 
transmission  rate  R  =  log  M„. 

In  [I]  de  Buda  aimed  at  showing  that  there  exist  structured 
(namely  lattice  ba.sed)  corles  for  the  AWG.N  channel  that  have 
the  same  near-optimal  error  probability  properties  as  Shannon’s 
"random”  codes.  To  this  eml.  de  Buda  considers  an  n-dimensional 
lattice  which  is  translated  by  a  vector  i.  The  bounding  region  of 
the  ( ode  is  a  "thir  k"  shell  (or  annulus),  i.e.,  the  region  7'  between 
an  outer  sphere  and  an  inner  sphere  both  centered  at  the  origin. 

De  Buda’s  main  result  claims  that  for  each  dimension  n,  there 
exists  a  lattice  code  of  the  above  typ<'  with  at  least  2"^  codepoinls 
such  that  its  error  probability  /'.  (n)  satisfies 

/U(n)  <  .l/-„(»6.  R..S/.V),  (1) 

where  the  right  side  is  ilelined  in  [1].  I  liis  im|)lies  that  essentially 
the  same  ujrper  bounds  are  valid  on  the  decrease  of  the  error  prob¬ 
ability  for  rates  below  the  chauiiers  capacity  as  the  ones  Shannon 
derived  for  random  (odes. 

I’here  seems  to  be  a  technical  error  in  [l]  in  the  proof  of  (I), 
with  imporlaul  c ousecpieuc cs  and  chauges  in  tlie  scope  of  the  re¬ 
sult.  lo  (oric'c  I  the  erior  we  iix'  a  boiiudiug  region  7  which  is 
more  appro|)rialel.v  described  a.s  a  Ihiii  shill. 

Foi  I  uuatelv  there  is  a  wav  to  modify  de  Buda's  prerof  so  that 
csseiil iailv  all  his  steps  remain  valid.  I'he  conclusion .  however  will 
be  somewhat  different.  I  he  idea  is  to  consider  the  code  that  re¬ 
sults  from  the  radial  projection  of  Ihe  lattice  code  onto  the  itiriir 
sphere.  In  this  way  we  gel  a  code  whose  error  (irobabilily  is  Init/ir 


then  that  of  the  lattice  code.  Thus  choosing  the  inner  radius  for 
each  dimension  n  as  R„  =  \/nSn,  where  5„  I  5  as  n  —  cx),  the 
above  argument  and  de  Buda’s  corrected  result  show  that  there 
exists  a  sequence  of  n-dimensional  lattice  codes  with  error  prob¬ 
ability  Pe(n)  for  which  P,(n)  <  'll  holds,  where 

E{R,  S/N)  =  lim„  —  log  Fn(9b,  R,  SfN).  This  means  that  for  rates 
satisfying  Rc  <  R  <  C,  de  Buda’s  lattice  codes  have  the  same  re¬ 
liability  exponent  as  that  of  optimal  codes,  and  for  rates  below 
the  critical  rate  Rc  the  error  probability  of  these  lattice  codes  has 
essentially  the  same  exponential  upper  bond  as  Shannon’s  code. 

The  shell  that  contains  the  codepoints  can  no  longer  be  be 
called  a  “thick  shell”;  the  more  appropriate  description  is  “thin 
shell”.  It  is  worth  noting,  that  since  the  function  Fn{9b,  R,  S' / N) 
is  continuous  in  S',  by  choosing  the  inner  radius  S'  <  S  close 
enough  to  S,  de  Buda’s  result  guarantees  the  existence  of  an  rr- 
diinensional  lattice  code  whose  error  probability  is  upper  bounded 
by  a  (juantity  arbitrarily  close  to  the  upper  bound  for  Shannon’s 
code.  However,  Ihe  better  this  approximation  is  the  less  the  thin 
shell  bounded  lattice  code  resembles  a  lattice  code  in  the  usual 
sense,  and  the  more  it  looks  like  a  "random”  spherical  code,  for 
which  Shannon  originally  proved  the  error  bounds. 

Were  de  Buda’s  original  proof  to  be  correct,  one  might  ar¬ 
gue  that  the  class  of  sphere  bounded  lattice  codes  or  even  lattice 
bounded  lattice  codes  are  asymptotically  optimal  as  the  dimension 
of  the  signal  constellation  grows.  However,  this  conclusion  initially 
appears  not  to  follow  from  our  corrected  version  of  the  proof  since 
the  codepoints  derived  from  the  lattice  are  those  which  lie  in  a 
thin  spherical  shell,  and  specifically  exclude  the  lattice  points  in¬ 
terior  to  the  inner  sphere.  Adding  these  points  to  the  code  would 
invalidate  our  presented  proof. 

In  effect,  the  radius  of  the  thin  spherical  shell  is  made  to  be 
large  enough  that  enough  lattice  points  fall  within  the  sphere  as 
needed.  The  main  advantages  of  structured  codes  such  as  those 
derived  from  lattices  are  generally  that:  (i)  its  points  can  be  easily 
enumerated  thus  avoiding  an  exhaustive  storage  of  the  points,  and 
(ii)  .signal  decoding  can  be  computed  cfTicienlly,  u.sing  algorithms 
that  exploit  the  lattice's  strnclnre.  These  advantages  appear  to 
be  lost  for  the  codes  we  used  to  correct  de  Buda’s  result. 

However,  an  argument  sustaining  the  asymptotic  optimality 
of  structured  codes  can  be  given  using  a  coding/shaping  gain  ap¬ 
proach.  We  give  a  discussion  of  this  implication. 
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Abstract — Several  new  algorithms  for  decoding  lattice  parti¬ 
tions  are  presented.  They  apply  to  Viterbi  decoding  of  multidi¬ 
mensional  trellis  codes  based  on  these  partitions.  In  [1, 2],  trellis- 
based  algorithms  were  presented  for  decoding  the  lattice  parti¬ 
tions.  The  new  algorithms  can  achieve  about  50%  reduction  of 
the  complexity  of  decoding  the  lattice  partitions  in  terms  of  real 
additions/comparisons  compared  with  the  algorithms  of  (1,  2). 
The  complexity  of  the  resulting  overall  Viterbi  decoding  algo¬ 
rithms  still  shows  a  modest  improvement.  An  algorithm  for  soft 
decision  decoding  the  first-order  Reed-Muller  code  (8,4,4)  or 
the  Cosset  lattice  is  also  presented.  It  involves  at  most  17  real 
operations,  thus,  improving  the  best  known  algorithm. 

Summeiry 

A  typical  multidimensional  trellis  coded  modulation(MTCM) 
scheme  can  be  simply  described  by  two  basic  ingredients:  one  is 
the  cosets  of  a  lattice  partition  A/A',  where  A  is  a  lattice  and  A' 
is  a  sublattice  such  that  the  order  of  the  partition  is  finite;  the 
other  is  a  conventional  binary  convolutional  code.  The  output  of 
the  binary  encoder  chooses  the  coset,  and  some  other  information 
bits  specify  an  element  in  the  coset  [1,  Fig.l]. 

The  trellis  diagram  of  the  resulting  multidimensional  trellis 
code  is  essentially  the  same  as  that  of  the  convolutional  code. 
The  difference  is  that  the  labels  on  the  branches  of  the  trel¬ 
lis  disgr2UTi  of  the  convolutional  code  now  correspond  to  cosets. 
Thus,  a  trellis-searched  decoding  algorithm  such  as  the  Viterbi 
algorithm  can  be  used  to  decode  a  multidimensional  trellis  code. 

In  a  soft-decision  Viterbi-decoding  algorithm,  the  first  step 
of  the  decoding  requires  computing  the  branch  metrics.  This 
step  is  called  decoding  the  branches.  For  an  MTCM  based  on  a 
lattice  partition,  decoding  the  branches  turns  out  to  be  equiva¬ 
lent  to  decoding  the  lattice  partition  in  use.  This  means  that  the 
closest  points  in  each  of  the  cosets  to  the  received  point  has  to 
determined  and  the  associated  metrics  need  to  be  calculated. 

In  [1,  2],  Forney  gave  trellis-based  algorithms  for  decoding 
lattice  partitions.  His  algorithms  are  optimal  trellis  decoding 
for  given  coordinate  order  and  alphabet  among  all  the  trellis 
decoding  in  the  sense  that  it  uses  smallest  number  of  trellis 
states  (3).  Therefore,  the  expression  trellis-based  algorithm  will 
simply  stand  for  the  kind  of  trellis  decoding  algorithms  described 
in  (2). 

Certainly  decoding  the  branches  is  only  part  of  the  overall  de¬ 
coding  procedure.  However,  for  a  code  whose  number  of  states 
is  small  relative  to  its  dimension,  a  considerable  portion  of  the 
overall  decoding  work  is  due  to  decoding  the  branches.  Further¬ 
more,  in  most  practical  implementations,  the  number  of  trellis 
code  states  used  has  been  very  low(typically  4  or  8,  occasion¬ 
ally  16  but  rarely  more  than  16)  (4).  Therefore,  by  reducing  the 
complexity  of  decoding  the  branches,  it  is  possible  to  achieve  a 
considerable  amount  of  reduction  of  the  overall  decoding  com¬ 
plexity. 


In  this  work,  we  present  several  new  algorithms  for  decoding 
the  lattice  partitions.  Most  of  the  algorithms  can  achieve  about 
50%  reduction  of  the  complexity  of  decoding  the  branches.  They 
result  in  modest  improvement  of  the  overall  decoding  complexity. 

Most  previous  known  efficient  algorithms  for  soft-decision  de¬ 
coding  block  codes  and  lattices  rely  on  decoding  each  coset  of  a 
subcode  of  certain  type  in  combination  with  Wagner  decoding 
rule.  The  Wagner  decoding  rule  applies  to  binary  codes  whose 
check  matrix  consists  of  a  single  all-one  row.  It  states  that  an 
entry- by-entry  hard  detection  of  the  received  word  is  to  be  fol¬ 
lowed,  unless  the  number  of  1  bits  is  already  even,  by  inversion  of 
the  least  reliable  bit.  Our  algorithms  also  fall  into  this  category. 

The  complexity  of  the  decoding  is  measured  by  the  total  num¬ 
ber  of  real  additions  and  comparisons.  Certainly  the  actually 
running  time  always  depend  to  some  extent  on  implementation 
technology  in  use.  We  did  try,  however,  to  evaluate  all  the  al¬ 
gorithms  in  a  uniform  way.  In  compliance  with  the  convention 
estabilished  in  the  literature,  we  ignore  such  operations  as  mem¬ 
ory  addressing,  negation,  taking  the  absolute  value,  and  multi¬ 
plication  by  2,  as  well  as  the  checking  of  logical  conditions  and 
modulo  2  additions. 
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Code  optimisation  for  Hnite  error  rate 
A.  G.  Burr*  aid  T.  J.  Lurin'^ 

Abstract 

We  present  a  new  construction  based  on  Blokh  and  Zyablov's 
generalised  concatenated  codes  for  codes  and  coded  modulation 
schemes  with  coding  gain  optimised  for  a  given  decoded  word  enor 
rate,  rather  than  for  asymptotic  coding  gain.  It  is  shovm  that  this  may 
be  achieved  by  a  geometric  structure  for  the  codes  in  which  not  aU 
neighbouring  codewords  are  at  equal  distances,  which  implies  also 
that  the  minimum  distance  of  the  code  is  no  longer  maximised,  llie 
technique  may  be  ^tplied  to  coded  modulation  schemes  where  the 
‘inner  code"  is  a  multilevel  signalling  constellation,  or  to 
concatenated  binaiy  codes  or  codes  over  GF(q).  The  outer  c^  may 
be  a  block  or  a  trellis  code.  The  technique  is  illustrated  with  reference 
to  a  block  coded  modulation  scheme,  block  coded  8-PSK. 

Introduction 

Error  correcting  codes  and  coded  modulation  schemes  are 
conventionally  designed  for  maximum  asymptotic  coding  gain 

(ACGX  i.e.  for  maximum  coding  gain  as  signalAioise  ratio  =o  «>  and 

error  rate  =»  0.  However,  in  a  practical  conununication  system  the 
asymptotic  coding  gain  is  often  not  the  prime  consideration,  since 
there  is  a  Hnite  error  rate  that  may  be  tolerated:  10-*  or  in  a 
telecommunications  system,  or  10'^  or  poorer  in  a  speech  system. 
Codes  designed  for  gorxl  ACG,  and  especially  block  or  lattice  codes, 
may  well  pmoim  po^y  at  such  error  rates  [IJ. 

It  is  well-known  [2]  that  ACG  may  be  optimised  by  maximising  the 
minimum  distance  between  any  pair  of  codeword.  In  the  case  of 
binary  block  codes  the  distance  metric  is  generally  the  minimum 
Hamming  distance,  while  for  coded  modulation  schemes  it  is  the 
minimum  Euclidean  distance.  However,  it  is  also  clear  that  this 
techni^e  may  not  rtecessarily  yield  the  best  coding  gain  at  finite  error 
rates.  This  paper  presenu  a  technique  based  on  Blokh  &  Zyblov's 
generalised  cotKatenated  codes  [3,4]  to  design  codes  with  optimum 
coding  gain  at  finite  error  rales. 

A  (block)  code  with  maximised  minimum  distance  corresponds  to  an 
optimally  dense  sphere  packing  in  n-  dimensional  space  [2].  Here 
each  codeword  may  be  represented  as  the  centre  of  a  sphere  in  the 
signal  space  surrounded  by  and  in  contact  with  a  number  of  other 
identical  spheres.  Hence  the  minimum  Euclidean  distance 
between  n^words  is  twice  the  radius  of  the  spheres,  and  all  n, 
neighbouring  codewords  lie  at  this  distance.  The  word  error  rate  may 
be  ^rproximaled  using  the  union  bourxl  as: 


where  a  is  the  standard  deviation  due  to  noise. 

Gcomclrk  structure  for  Bnltc  error  rates 

For  equal  minimum  disiaiKe  codes  the  asymptotic  coding  gain  is 
determined  by  the  minimum  distance,  and  it  is  krwwn  that  it  is 
maximised  for  given  rate  and  dimenskmaliw.  However,  at  fmile  error 
rates  their  performance  is  significantly  affected  by  the  number  of 
neighbours  n„,  which  can  be  extremely  large  in  block  coded 
modulation  schemes.  Hence  block  coded  modulation  schemes  with 
good  asjrniptotk  coding  gain  frequently  achieve  a  much  prorer  result 
at  practical  error  rates.  A  geometric  structure  that  maximises  coding 
gam  at  finite  error  rate  is  iwl  known. 

We  therefore  propose  a  geometric  structure  for  optimum  performance 
at  finite  error  rate  in  whkh  all  neighbours  no  longer  lie  at  the  same 
distance.  In  the  sphere  packing  model,  we  allow  some  s^ieres  to 
shrink,  while  others  grow,  so  that  the  overall  number  of  spheres 
remains  the  same.  In  the  codes  presented,  this  resulu  in  a  series  of 
shells  of  neighbours  containiM  different  numbers  of  codewords  at 
different  minimum  distances.  Im  word  error  rate  then  becomes: 


*  Univtriily  of  York.  V.  K. 
*aTLabonlorUs.U.K. 


i 

where  n,-.  di  are  the  number  of  neighbours  and  the  die'snce, 
respectively,  for  the  shell.  The  distances  and  number  of 
neighbours  in  each  of  the  shells  may  then  be  chosen  to  minimise  their 
contribution  at  a  particular  error  rate,  while  allowing  it  to  increase 
elsewhere,  where  it  will  exceed  that  error  rate. 


at  different  distances,  and  compariscm  with  a  conventional  scheme 
Design  of  codes  using  BCM  constructioo 

The  Blokh  and  Zyablov/Cusack  construction  for  G(X/BCM  [3,4] 
provides  a  means  of  constructing  codes  and  coded  modulation 
schemes  with  these  characteristics.  The  code  is  defined  by  a  maulx  of 
codes  defming  the  choice  of  inner  code  or  constellation  points  from  a 
series  of  partitions.  The  top  row  of  the  matrix  chooses  between 
partitions  having  the  smallest  minimum  Euclidean  distance,  and 
therefore  requires  the  code  with  the  largest  Hamming  distance.  The 
next  row  corresponds  to  a  partition  with  a  larger  Euclidean  disumce. 
and  therefore  the  code  has  smaller  Hamming  distance. 

Conventionally  the  set  of  Hamming  distances  are  chosen  so  that  the 
effective  minimum  squared  distance,  given  by  the  product  of  the 
Hamming  and  the  squared  Euclidean  distance,  is  equal  for  each  row. 
However,  if  they  are  chosen  to  give  different  ^feciive  disUuices,  then 
the  required  code  structure  for  finite  error  rate  is  created. 

The  diagram  (Fig.  I.)  shows  the  BER  curves,  using  the  union  upper 
bound,  tor  an  example  with  three  shells,  optimised  for  coding  gam  at 
approximately  10'^,  showing  the  curves  for  the  separate  shells  and  the 
overall  result.  This  example  uses  coded  8-PSK  length  31.  with 
(31,1,31)  (repetition  code),  (31,20.6)  (expurgated  BCH),  and  0131.1) 
(uncoded)  codes  on  the  top,  middle  and  bottom  rows  respectively.  It 
can  be  seen  that  the  top  row  contribution  dominates  at  lower 
sigiialAioise  ratio,  the  bottom  row  at  higher  signalAioise,  while  at  the 
design  error  rate  the  three  contributions  are  quite  similar,  resulting  in  a 
"bite"  in  the  curve  at  this  point  which  optimises  the  coding  gain. 

The  BER  curve  for  a  conventionally  designed  BCM  scheme,  using  the 
codes  (32.7,14)  (extended  GoppaX  (32,21X4)  (extended  Hamming)  and 
(32,31,2)  (even  parity),  is  also  shown  in  Fig.  1.  for  comparison.  It  can 
be  seen  that  while  this  gives  a  significantly  higher  asymptotic  coding 
gain,  at  BER  of  10*^  the  coding  gain  is  poorer  by  more  that  1  dB. 
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Abstract 

This  paper  is  concerned  with  the  evaluation  of  the  block  error  probabil¬ 
ity  Pt  of  a  block  modulation  code  by  optimum  or  suboptimum  soft-decision 
decoding  for  an  AWGN  channel.  In  the  range  Di  of  low  signal-to-noise  ra¬ 
tios,  it  is  feasible  to  evaluate  Pt  by  simulation.  In  the  range  Dg  of  very 
high  signal-to-noise  rati<»,  tight  upper  bounds  on  P«  are  available.  However, 
in  most  cases,  there  remains  a  gap  between  Dl  and  Dgy  and  the  value  P^ 
for  signal-to-noise  ratios  in  this  gap  may  be  of  practical  interest.  In  general, 
very  small  values  of  the  block  error  probability  are  required.  It  is  infeasible 
to  evaluate  a  very  small  P^  simply  by  simulation. 

In  this  paper,  the  maximum-likelihood  soft-decision  decoding  of  a  block 
modulation  code  for  an  AWGN  channel  is  considered,  and  a  new  method  of 
evaluating  the  block  error  probability  P«  for  a  wide  range  of  signal-to-noise  ra¬ 
tios  is  presented.  The  evaluation  of  P*  is  reduced  to  numerical  computations 
and  simulations  on  statistics  with  not  very  small  mean  value.  Computation 
results  for  the  block  error  probabilities  of  some  block  modulation  codes  are 
given. 

Summary 

Let  C  be  a  block  code  of  length  n  over  a  finite  set  L  of  symbols.  Let 
h  be  a  positive  integer.  For  u  €  L,  let  s{u)  denote  the  signal  point  in 
represented  by  u,  where  denotes  the  set  of  all  h-tuples  of  real  numbers, 
and  for  an  n-tuple  u  =  (ui,U2,  -  •  •  ,u„)  over  L,  let  s(u)  denote  the  hn-tuple 
(s(tti),s(u2),  •,s{u„)).  For  z  and  z'  in  72^”,  let  ||z  -  z'||  denote  the  Eu¬ 

clidean  distance  between  z  and  z'. 

Assume  that  the  channel  is  an  AWGN  channel  and  that  the  maximum 
likelihood  decoding  is  used  and  every  codeword  is  equally  likely  to  be  trans¬ 
mitted.  For  simplicity,  we  consider  the  case  where  s(vq)  is  transmitted  for  a 
fixed  codeword  vq.  Let  P«  denote  the  probability  of  incorrect  decoding  when 
s(vo)  is  transmitted.  For  a  codeword  v  of  C  -  {vq},  let  be  defined  as 

D.i  U  {*€«'‘":||l-.(v)||<||*-,(vo)||}.  (1) 

V€C-(Vq) 

For  a  nonnegative  real  number  r,  let  S(r)  be  the  surface  of  the  hn-dimensional 
sphere  of  radius  r.  Let  D,{r)  be  defined  by 


D,{t)  i  D.  n  S(r). 


(2) 


For  a  surface  5,  area|5]  denotes  the  area  of  5.  Let  /(r)  denote  the  proba¬ 
bility  of  incorrect  decoding  under  the  condition  that  the  Euclidean  distance 
between  transmitted  s(vq)  and  received  /in-tuple  z  is  r,  and  let  <7(r)  de¬ 
note  the  density  function  of  probability  that  the  Euclidean  distance  between 
transmitted  s(vq)  and  received  z  is  r.  It  follows  from  the  definitions  that 


where 


/(r)  =  ar«a(£»,(r)]/area(S(r)]  <  1, 

P.  =  rf{r)g{r)dT, 

Jo 


?(>■)  = 


2V,T'‘"r(^) 


(3) 

(4) 

(5) 


and  <r’  =  |(.|-  For  a  codeword  v  in  C  -  {vq)  and  a  positive  real 

number  r,  let  /7.(r,v)  be  defined  as 


D.(r,v)i{»e«'-":||s||  =  r  and  ||*  -  »(v)||  <  ||i  -  s(vo)||}.  (6) 

Then,  the  ratio  area[D.(r, v)]/area[5(r)]  depends  only  on  r/S,  where  6  = 
||s(v)  —  s(vq)||.  Let  p{r/6)  denote  the  ratio.  Then,  p{r/6)  is  given  as  follows; 
1)  For  t/6  <  1/2, 

p(r/S)  =  0  (7) 

and  2)  for  r/6  >  1/2  and  fin  >  2, 


p{r/6)  = 


n<f) 

0Fr(^^) 


r 


(5in^)*'"~’d^ 


(8) 


where  =  cos  '  p,  and  a  formula  for  /*(siny>)*"  ^dip  is  known. 
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Theory  and  Its  Applications. 


Arrange  {||s(v)  -  s(vq)||  :  v  €  C  -  {vq}}  into  the  increasing  sequence 
of  real  numbers  without  repetition,  denoted  <fl'*  <  d^^*  <  ■  •  ■  <  For 

1  <  i  <  m,  let  be  defined  as 

C<"^{v6C:||s(v)-,(vo)||  =  d<'>},  (9) 

and  let  denote  the  number  of  codewords  in  C^'K  Let  be  the 

number  of  codewords  in  the  following  subset  C^*^(r)  of 

—  {v  :  there  exists  a  codeword  u  of  C  such  that 
(1)  ll»(v)  -  s(vo)||’  >  l|s(u)  -  s(vo)|l^  +  ||s(u)  -  s(v)||’ 
and  (2)  the  radius  of  the  circumscribed  circle  of 
trtan9les(v0)s(v)s(ii)  is  not  smaller  than  r}. 

Then,  the  following  theorem  [3]  holds  for  /(r). 

Theorem  1:  Suppose  that 

=  min  ||s(u)  -  s(v)||.  (10) 

\  ) 

Then  (i)  for  0  <  r  <  dt*>/2,  /(r)  =  0, 

(ii)  for  d^*^/2  <  r,  /(r)  increases  monotonously  as  r  increases, 

(iii)  for  S^^/2  <t  < 

!(r)  =  '^A^'^p(TlS%  (11) 

tsl 

where  j  is  the  largest  index  such  that  d*-**  <  (2d**’)/\/3, 
and  (iv)  for  <  r 

t  l 

fir)  <.min{l,5^  A'’>(r)p(r/d<'>))  <  min{l,  A'’V(»'/‘<‘’’)).  (12) 

»sl  »sl 

where  f  is  the  largest  index  such  that  <  2r.  AA 

In  the  range  of  r  >  fy/Z  such  that  /(r)  is  very  small,  the  upper  bound 
given  by  (12)  may  be  used,  and  in  the  nuige  of  r  such  that  /(r)  is  not  very 
small,  /(r)  may  be  evaluated  by  simulation.  Several  examples  show  that 
Theorem  1  is  useful.  Let  /(r)  denote  the  right-hand  side  of  (12).  Then  upper 
bound  on  Pt  is  better  than  the  conventional  union  bound  at 

low  signal-to-noise  ratios. 

Let  RMm,>  be  the  )th  order  Reed-Muller  code  of  length  2"*.  The  block 
error  probability  of  RhU.2  ns  a  BPSK  code  is  evaluated.  The  code  has  a 
4-section  trellis  diagram  with  1024  states  and  the  value  of  d*^^  is  S.  Let 
denote  the  value  given  by  (4)  in  which  /(r)  is  evaluated  by  simulation  for 
7  <  r  <  15  and  by  using  the  right-hand  side  of  (12)  for  other  values  of  r. 
Direct  simulations  on  the  block  error  probabilities  for  -1.36  <  Eb/JVo  <  3.64 
were  made.  The  simulation  results  are  almost  the  same  as  the  values  of  pi]} 
in  the  range  of  signal-to-noise  ratios.  The  values  of  Pi,V  approach  to  the 
conventional  union  bounds  for  4.6  <  E^/Nq. 

The  block  error  probability  of  a  basic  multilevel  S-PSK  code  [1]  of  length 
32i8  also  evaluated.  The  component  codes  are  RMsj,  RMj  s  and  P33,  where 
P32  denotes  the  all  even  weight  code  of  length  32.  The  value  of  d^*^  is  2.828. 
Let  Pi^J  denote  the  value  given  by  (4)  in  which  /(r)  is  evaluated  by  simulation 
for  2.3  <  r  <  4  and  by  using  the  right-hand  side  of  (12)  for  other  values  of 
r.  The  values  of  pIJ^  are  almost  the  same  as  the  direct  simulation  results 
for  2.06  <  Ek/^o  <  0.06  and  approach  to  the  conventional  union  bounds  for 
6.06  <  E^/No. 

The  above  method  may  be  extended  to  suboptimum  soft-decision  multi¬ 
stage  decoding. 
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Abstract  Two  kinds  of  common  infonnation  tire  defined 
for  two  correlated  random  variables,  and  they  are  represented  by 
single  letter  characterizations. 

Summary 

Let  A'  and  V’  be  independent,  identically  distributed,  but  mu¬ 
tually  correlated  random  variables.  Assume  that  we  want  to 
encode  (X*',Y^)  to  three  codewords  (KvAVikcli  where  VV 
and  Uj-  represent  each  private  information  of  X''  and  Y* ,  re¬ 
spectively,  while  V'c  represents  their  common  information. 

In  the  special  case  where  we  can  write  X  =  (S\,Sc)  and  Y'  = 
(5v,5c')  where  Sx,  Sy,  Sc  are  mutually  independent,  we  can 
easily  show  by  encoding  Sy,  Sy-,  S*,  to  Vy ,  Vy,  Vf.  respectively, 
that  there  exists  a  code  satisfying  the  following  inequalities  for 
any  given  i  >  0. 

Pr{X>^  ^Gx{VxVc)}<S.  Pr{Y''  #Gy  (V-vVr)}<«,  (1) 

L^H^Vx)  +  H(Vc  ))  <  H(X)  +  6,  ^{H{Vy)  +  H(Vc))  <  H(Y)  +  6 

(2) 

l(P(Vv)  -f  H(Vy)  +  H(Vc))  <  mXY)  +  S.  (3) 

l/f(X'‘|Uv  )>/f(A)-i,  Lh{Y’'IVx)  >  H(Y)  -  6,  (4) 

i/f(X'''|V.vV-v)  >  H{Vc)  -  S,  lH{Y>''\VxVy)  >  H(Vc)  -  «  (5) 

where  Gx  and  Gy  are  decoders  for  X  and  Y,  respectively.  These 
conditions  correspond  to  the  following  intuitive  feeling. 

1.  (1):  X*  and  Y''  should  be  recoverc,..  from  the  correspond¬ 
ing  private  information  and  the  common  information. 

2.  (2)(3):  Each  private  information  and  the  common  informa¬ 
tion  should  not  include  any  redundancy. 

3.  (4):  Each  private  information  should  be  independent  of  the 
other  information. 

4.  (5):  The  common  information  should  convey  the  same 
amount  of  information  about  X^  and  Y* . 

However,  for  general  correlated  random  variables  A'  and  Y',  it  Ls 
impossible  to  con.struct  a  code  satisfying  all  (1)  (5). 

Gacs-Korner  [1]  and  VVyner  [2]  defined  two  kinds  of  common 
information,  say  Cc/hi^'-Y)  and  Civ(A';Y’),  respectively,  which 
satisfy  that  1*1 

Cgk{X-.Y)  <  HX-,Y)  <  Cwt.X-,Y)  <  miuiH{X).H{Y)).  (6) 


though  the  condition  ( 1 )  is  weakened  to 

Pr{X'' Y'*'  /  GyylKYUyVc  )}  <  «,  (7) 


where  Gxy  is  a  decoder  for  A'  and  YL 
The  first  common  information  C\{X\Y)  is  defined  as  follows. 


In  other  words,  Ci(A';Y')  is  the  rate  of  the  attainable  mini¬ 
mum  core  Vc  of  (X^ ,  Y*  )  by  removing  each  private  information, 
which  is  independent  of  the  other  information,  from  {X*,Y*) 
as  much  as  possible. 

In  the  second  definition,  we  consider  (Vy.Vy)  as  noncommon 
information  V^,  and  we  define  C2(A:Y')  as  follows. 

C2(A:y)  =  lim  sup  (9) 

~®(Vr.Vjr)  satisfies  (10)-(12)  “ 

PrlX^'  Y''  #  G.vylV'j.Uc  )}  <  b.  (10) 

^(H(Vc)  +  H(V^))<H(XY)  +  6.  (11) 

lff(X'''|U;.)  >  Ijf(Vr)  -i.  ip(Y''|Vjr)  >  lif(Kc)  (12) 

In  other  words,  C2(A’;  Y')  is  the  rate  of  the  attainable  maximum 
core  Vc  such  that  if  we  loose  Y'c,  then  each  uncertainty  of  X*' 
and  Y^  becomes  H(Vc). 

The  following  theorem  holds  for  these  C|(A';  Y')  and  C2(A';  V'). 
Theorem 


Ci(X;n  =  I(X-.Y)  (13) 

C2(X-.Y)  =  ima{H(X).U(Y))  (14) 

Cck(X-.Y)  <  Ci(A;y)<Cw(X;y)<C2(X;y')  (15) 

Proof:  Omitted. 

The  result  of  (13)  coincides  with  our  intuitive  feeling  that 
/(A':Y')  represents  a  kind  of  common  information  between  X 
and  Y', 

On  the  other  band,  the  result  of  (14)  does  not  coincide  with  our 
intuitive  feeling  though  the  definition  (9)  seems  to  be  reasonable. 
This  is  cau.sed  from  the  fact  that  in  addition  to  the  common 
part,  each  private  part  can  share  the  uncertainty  each  other  by 
devising  the  encoding.  As  an  example,  consider  the  special  case, 
X  =  (Sx.Sc)  and  Y'  =  (Sy.  Sc),  such  that  Sy  6  {0, 1,  •  •  • ,  .V/y — 
l},Sy  €  {0.1,-  -,A/y-l}.  H(Sy)  =  logA/y,H(Sy)  =logA/y, 
A/y  <  My.  Even  for  this  case,  C2(A';Y’)  =  mm{H{X).H(Y)) 
can  be  achieved  by  letting  =  Sy  ©  Sy  and  Vc  =  (Sc.Sy) 
where  ©  represents  modulo  A/y  summation. 
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ABSTRACT 

The  entropy  of  a  source  with  alphabet  size  n  is  between  0  and  logn. 
FVequently,  it  would  be  useful  to  have  a  more  precise  idea  of  what  value 
should  be  expected.  For  that  purpose,  we  compute  the  mean  and  the 
variance  of  the  entropy  with  the  average  being  taken  over  all  prob¬ 
ability  distributions  for  a  given  alphabet  size  n.  The  largest  difference 
between  logn,  i.e.,  the  value  for  equidistribution,  and  the  mean  A^  is 
obtained  in  the  limit  n  — •  oo  and  is  equal  to  1  —  7  nats/symbol,  with 
7  the  Euler-Mas<.heroni  constant.  This  has  implications  for  the  com¬ 
pression  of  data  from  memoryless  sources.  The  variance  of  many 
statistical  systems  scales  such  that  <t\lhn  goes  to  a  constant  in  the  limit 
n  — »  00.  Surprisingly,  in  the  present  case,  we  have  the  much  faster  decay 
exp(An)<7^  — ►  (ir^/3  —  3)  as  n  — *  00.  The  actual  values  of  the  variance 
are  usually  so  small  that  the  average  value  An  can  be  substituted  for  the 
entropy  in  most  applications. 


2) 


Ss2 


(»  -  U 
("  +  1) 


(2) 


As  an  immediate  consequence  of  the  first  part  of  the  theorem,  we  have 


Corollary  2  Define  Am 

:=  log  n  —  Am  tAen  for  any  n  >  2  : 

1) 

n  +  1 

— )  <  hft  <  logn 

(3) 

2) 

A.+i  > 

(4) 

3) 

lim  A,  =  1  -  T. 

(5) 

■t^oo 


where  7  =  0.5772156649015329  .  .  denotes  the  Buler-Mascheront  con¬ 
stant,  t.e.,  the  constant  that  appears  in  Euler’s  infinite  product  repre¬ 
sentation  of  the  Gamma  function. 


1.  INTRODUCTION 

The  Shannon  entropy  Hn(p)  appears  as  a  central  quantity  in  many  com¬ 
munication  problems.  It  is  defined  by 


«-(p)  :  = 


■  •ogp. 


Note  that  for  convenience,  we  use  natural  logarithms. 

The  entropy  ffn(p)  «  easily  evaluated  for  any  given  distribution  Often 
one  faces,  however,  the  problem  of  a  typical  value,  i.e.,  a  value  that  is 
typical  for  a  given  alphabet  size  n.  Let  us  assume  that  we  can  compute 
the  mean  and  variance  of  /fn(p)  with  respect  to  p  =  (pi .  •  .pf«)>  where 

p  runs  over  all  probability  distributions.  Then  the  mean  value  could  be 
seen  as  being  a  typical  value,  whenever  the  variance  is  small. 

The  average  which  we  consider  is  an  unweighted  average  over  all  prob¬ 
ability  distributions,  i.e.,  we  assume  that  all  probability  distributions  p 
from 

ft 

^«  =  |(Pl.  .P«)  :  p.  >  0.  y^Pj  =  l| 

J=1 

are  equally  likely.  The  mean  A»  and  the  variance  aj  of  the  entropy  Hn{p) 
then  become 

>>’>  ■=  j  <tpi  j  <tPn  HPI  +  +P«  -  l)H«(p), 

~  j  ‘'Pi  y  <<p.  »(pi+  +p. -lK«.(p)-'•.)^ 

where  6  is  the  Dirac  distribution  and  where 

|5.|  :=  f  dpi...  I  dp, 

Jo  Jo 


.  Hpi  +  ■  ■  +  p. 


1). 


denote*  the  volume  of  S, . 

3.  RESULTS 
The  computetion*  are  lummarized  u  follow* 

Theorem  1  For  any  n  >  3.  the  mean  h,  and  vartance  oj  the  en¬ 
tropy,  when  averayed  over  S,  are  yiven  iy 


») 


(i) 


Corollary  2  tells  that  the  discrepancy  of  the  average  entropy  with  respect 
to  the  value  obtained  for  equidistribution  increases  monotonically  to  the 
value  1  —  7  nats/symbol.  The  relative  loss  (logn  —  hn)fhn  goes  to  U 
For  a  source  with  entropy  A,,,  this  implies  that  coding  can  not  reduce 
the  average  code  word  length  of  a  memoryless  source  by  more  than  0.61 
bits/symbol  ((1  —  7)/log2  <  0  61),  whatever  the  size  of  n  is.  The  relative 
gain  of  source  coding  goes  to  0  with  an  increasing  size  of  the  alphabet 
Consider  n  independent  random  variables  fj,  that  are  identi¬ 

cally  distributed.  If  that  distribution  has  a  mean  and  a  variance,  say  n 
and  tr^,  then  has  the  mean  fdn  =  the  variance  s 

Thus,  the  variance  increases  proportionally  to  n,  i.e.,  proportionally  to 
Pn.  This  is  what  we  are  typically  used  to.  In  the  present  case,  we  have  a 
rather  different  behaviour: 

Corollary  3 

J)  On  is  monotontcally  decreasing  for  n  >  3. 

2)  lim^^oo  no^  =  -  3. 

Corollary  3  means  that  the  system  literally  freezes  in  an  average  be¬ 
haviour.  The  variance  goes  exponentially  fast  to  U,  in  the  sense  that 
limn^oo  exp(A„  =  constant  From  a  practical  point  of  view  this  im¬ 
plies  that  the  entropies  of  distributions  with  a  large  value  of  n  need  not 
be  computed  they  are  close  to  the  average  value  with  a  high  probability 
This  is  made  more  precise  in  the  following  table 

n  hn 


2 

(IS 

0  187 

A 

1  083 

0  191 

8 

1  718 

0  161 

16 

2.381 

0  124 

32 

3. OSS 

0  091 

64 

3  744 

0  066 

128 

4  433 

0  047 

3.  CONCLUSION 

We  have  found  a  typical  value  A*,  for  the  entropy  of  any  source  with 
an  alphabet  of  n  letters  This  value  deserves  its  name  in  the  sense  that 
^  (fr^/3  —  3)exp(  — Am)  ih  the  limit  n  —  oo  The  difference  between 
the  entropy  for  equidistribution  logn  and  the  mean  Am  increases  mono¬ 
tonically  with  n  -•  oo  but  is  bounded  byl-7<0611og2  nats/symbol 
Thus,  0.61  bits/symbol  is  the  maximal  coding  gain  that  can  typtcaJly 
be  expected  for  memoryless  sources  This  shows  that  data  compression 
strongly  relies  on  memory 
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One  of  the  most  useful  results  in  the  Shannon  theory  is  the  lower 
bound  on  mutual  information  due  to  Fano 


Theorem  1.  Suppose  that  X  and  Y  are  random  variables  that  satisfy: 

a)  X  and  Y  take  values  on  the  same  finite  set  with  cardinality  M 

b)  either  X  or  Y  is  equiprobable. 

Then, 

/(X:Y)  i  P[X  -  Y]  log  Af  -  h(PiX^Y])  (1) 

where  h  is  the  binary  entropy  function. 


The  purpose  of  this  paper  is  to  give  a  more  general  version  of  the 
lower  bound  in  Theorem  1  by  drt^ping  its  assumptions. 

The  restriction  that  X  and  Y  take  values  on  the  same  set  is  made 
throughout  for  convenience  in  expressing  the  results.  It  is  easy  to  see 
from  the  mutual  information  data  processing  theorem  that  it  can  be  lifted 
by  replacing  /’[X^Y]  by  F[X«i^(Y)l  where  4  is  an  arbitrary  function 
mapping  the  space  of  Y  to  the  space  of  X.  The  assumption  that  at  least 
one  of  the  random  variables  is  equiprobable  is  a  nontrivial  restriction, 
which  we  want  to  eliminate. 

The  power  of  Theorem  1  stems  from  its  ability  to  lower  bound  the 
mutual  information  between  two  random  variables  in  terms  of  a  single 
parameter  computable  from  their  joint  distribution:  the  probability  that 
the  random  variables  take  the  same  value.  Since  it  is  possible  to  con¬ 
struct  independent  (nonequiprobable)  random  variables  (X.Y)  for  any 
arbitrarily  specified  PIX^Y],  it  is  apparent  that  dropping  assumption  b) 
of  Theorem  1  will  require  a  lower  bound  that  depends  on  the  distribution 
of  X  and  Y  not  only  through  FiX^Y],  but  through  some  other,  hope¬ 
fully  simple,  quantity. 

Consider  the  following  result. 


Theorem  2.  Define  the  binary  divergence  function  d(x||y)  as  the  diver¬ 
gence  between  the  two-mass  distributions  (x,  1  -x)  and  (y,  1  -y).  If 
X  and  Y  take  values  on  the  same  set,  then 

/(X;Y)  i  d(FlX-Y]llFlX-f]),  (2) 


Note  that 

F[X-Y1  -  £  Px{io)Py{<o)  (4) 

i.e.,  the  inner  product  between  the  marginals  of  X  and  Y  which,  in  many 
cases  is  easy  to  obtain  from  the  description  of  X  and  Y. 

Condition  (3)  implies  that  the  marginals  are  either  nonovetlapping 
or  both  equiprobable. 

We  will  now  loosen  (2)  by  applying  the  following  lower  bound  on 
binary  divergence 

dfxlly)  ix  log  .1  -  fi(x)  (10) 

to  Theorem  2,  resulting  in  the  following  goieralization  of  Theorem  1: 


Theorem  3.  If  X  and  Y  take  values  on  the  same  set,  then 

/(X:Y)  s  P[X  -  Y]  log  -  HPlXmY])  (11) 

i  P[X  =  Y]  log - B  -  fi(F(X-Yl)  (12) 

max  Px(a>) 
aeo 

where  by  symmetry  we  can  replace  max  Px(a>)  by  max  Py(a>) . 

UEQ  aeO 

It  is  tempting  to  strengthen  the  Iowa  bound  in  Theorem  3  with 

/(X;Y)  i  F[X-Y)//(X)-fi(F[X=Y]).  (!?) 

Howeva,  countaexamples  to  (!?)  can  be  found.  It  is  possible  to  modify 
the  incorrect  bound  (!?)  in  terms  of  entropy  and  obtain  the  fr^owing 
result 

Theorem  4.  Assume  that  X  and  Y  take  values  on  the  same  sa  and 
denote 

p  -  inf  F[X-YtX«Htt)  -  inf  Py,x(fo\a)  (13) 

u  E  Q  m  C  a 


whae  Y  is  independem  of  X  and  has  the  same  distribution  as  Y.  Furth¬ 
ermore,  equality  holds  in  (2)  if  and  only  if 


Pxri^-y') 


aPx(x)Pr(y)  x-y 
fiPx(x)Pr(y)  x*y 


(3) 


Then, 


/(X;Y)  i  p  «(X)-A(F(X-Y]) 


If,  in  addition,  p  >  1  -  then 


(14) 


Tim  work  wi»  supported  in  pan  by  the  Ofhce  of  Neva!  Research  under 
Oram  N00014-<»-J-l7.t4. 
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/(X;Y)  i  p  H(X)-h(f). 


(13) 


Relations  Between  Entropy  and  Error 
Probability* 

Meir  Feder  *  Neri  Merhav  * 

Abstract 

The  relation  between  the  entropy  of  a  discrete  random  vari¬ 
able  and  the  minimum  attainable  probability  of  error  made  in 
guessing  its  value  is  examined.  While  Fano’s  inequality  provides  a 
tight  /otser  bound  on  the  error  probability  in  terms  of  the  entropy, 
we  derive  a  converse  result  -  a  tight  upper  bound  on  the  minimal 
error  probability  in  terms  of  the  entropy.  As  a  consequence  of 
this  relation,  a  channel  coding  theorem  for  the  equivocation  is 
presented.  At  a  rate  R  <C,  where  C  is  the  channel  capacity,  it 
follows  straightforwardly  from  the  classical  channel  coding  the¬ 
orem  and  the  bounds  above  that  the  equivocation  can  be  made 
arbitrarily  small  (exponentially  fast  with  the  block  length).  This 
result  is  proved  directly  for  DMC’s,  and  from  this  proof  it  is 
further  concluded  that  for  >  C  the  equivocation  achieves  its 
minimal  value  of  fl  —  C  at  the  rate  of  where  n  is  the  block 

length. 


In  this  work  we  explore  the  relationship  between  the  entropy  of  a 
random  variable  and  the  minimal  error  probability  in  guessing  its  value. 
The  well  known  Fano  inequality  [1]  provides  a  tight  lower  bound  on  the 
error  probability  in  terms  of  the  entropy.  We  derive  a  converse  result 
-  a  tight  upper  bound  on  the  minimal  error  probability  in  terms  of  the 
entropy. 

Specifically,  denote  by  t(X)  =  1  —  maXxp(z)  the  minimal  error 
probability  associated  with  the  random  variable  X  and  by  x'(Xiy)  = 
/ df’(j()[l  -p(i|y)],  where  x  =  x(y)  =  argmaxzp(z|y),  the  MAP  er¬ 
ror  probability  given  an  observation  Y.  Similarly,  denote  by  H(X) 
and  n(X\Y)  the  entropy  and  the  conditional  entropy  (equivocation) 
respectively.  Fano’s  inequality  states  that 

H  <*(ir)  =  />(x)  +  )rlog(A/-l),  (1) 

where  M  is  the  size  of  the  alphabet  of  X.  We  have  shown  a  converse 
result 

H  >  4>'(r)  (2) 

where 

=  a,  (t  -  1^)  (3) 

and  Oi  =  t(i  +  l)log  The  region  in  the  x  -  H  plane  determined 

by  inequalities  (2)  and  (3)  is  depicted  in  Figure  1  for  the  case  M  =  8. 

The  bounds  above  hold  for  x'(A)  and  H{X),  as  well  as  for  x(A'|y’)  and 
H{X\Y),  and  it  can  be  shown  that  both  bounds  are  sharp,  i.e.  each 
point  on  the  bounds  can  be  achieved  with  equality.  We  note  that  a 
weaker  bound  H  >  2t,  which  coincides  with  (3)  only  at  0  <  x  <  1/2 
has  been  observed  in,  e.g.,  [2]  and  [3]  pp.520-521. 

To  get  the  bound  (3)  we  first  calculated  a  function  0(x)  which  is 
the  minimal  entropy  for  each  given  value  of  error  probability  of  a  single 
random  variable;  the  bound  <t>’{x)  is  the  largest  convex  function  that 
is  smaller  or  equal  to  <^x). 

It  is  interesting  to  note  that  the  bounds  above  affirm  the  intuition 
that  a  random  variable  is  totally  random  (i.e.  H  =  logM)  iff  it  is 
totally  unpredictable  (i.e.  its  minimal  error  probability  is  (M  -  l)/M) 
and  conversely,  a  random  variable  it  totally  redundant  (i.e.  its  entropy 

is  zero)  iff  it  is  fully  predictable  (its  minimal  probability  of  error  is 
zero). 


’Thii  rcMuch  was  supported  ia  part  by  the  Wollson  Research  Awards  adminis¬ 
trated  by  the  Israel  Academy  of  Science  and  Humanities,  at  Tel-Aviv  University. 
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'Neri  Merhav  is  with  the  Department  of  Electrical  Engineering,  Technioa  -  Israel 
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The  relation  above  between  entropy  and  error  probability  leads  to  a 
statement  of  the  channel  coding  theorem  in  terms  of  the  equivocation. 
As  one  immediately  observe,  the  fact  that  zero  equivocation  is  achieved 
iff  zero  error  probability  is  achieved  and  the  classical  channel  coding 
theorem  imply  that  the  equivocation  of  the  codebook  can  be  made 
arbitrarily  small  (exponentially  fast  with  the  block  length)  provided 
that  R  <  C.  It  turns  out  that  this  observation  can  be  proved  directly, 
at  least  for  DMC’s.  This  proof  provides  an  insight  on  the  behavior  of 
the  equivocation  at  A  >  C. 

Specifically,  by  applying  a  random  coding  upper  bound  directly  to 
the  equivocation,  using  techniques  similar  to  Gallager’s  derivation  of 
the  coding  theorem  [3],  it  is  shown  that 

^(iCIil)  <  (l  +  (4) 

where  H{x\]Q  is  the  equivocation  of  the  codebook. 


Eo(p,g)  =  -  log  52 

V 


i+p 


and  where  we  identify  maxo<p<i  [£b(Pt9)  —  pR\  **  the  random  coding 
exponent  which  is  strictly  positive  as  long  as  R  <  C. 

The  inequality  (4)  holds  for  any  value  of  R.  This  bound  on  the 
equivocation  is  always  useful,  unlike  the  random  coding  bound  on  the 
error  probability  which  becomes  useless  at  it  exceeds  1.  When  R  =  C 
we  find  that  the  optimal  p  approaches  zero,  and  by  Taylor  expansion 
of  Eo{p,  g)  about  p  =  0  we  get 


<  ov^,  (5) 

where  a  >  0  is  some  constant. 

Using  (5)  it  is  further  easy  to  see  that  when  R  >  C  there  exists  a 
codebook  whose  equivocation  satisfies 

l^H(2L\l.)<R-C  +  0{n-^l^).  (6) 

Since  always  >  H(.2L)  -  n  ■  max,/(iC;I.)  =  nR  -  nC,  we 

conclude  that  the  equivocation,  per  input  symbol,  can  be  made  exactly 
R-  C,  at  a  rate  C)(n“*/*). 

We  finally  note  that  other  techniques  for  bounding  the  error  prob¬ 
ability  can  be  used  for  bounding  the  equivocation.  For  example,  it  can 
be  shown  directly  that  the  expurgated  error  exponent,  which  at  low 
rates  provides  better  exponent  than  the  random  coding  exponent,  is 
applied  to  the  equivocation. 

[1]  R.  Fano.  Class  notes  for  the  course  6.574,  transmission  of  informa¬ 
tion,  Massachusetts  Institute  of  Technology,  1952. 

[2]  M.  E.  Heilman  and  J.  Raviv.  “Probability  of  error,  equivocation, 
and  the  Chemoff  bound,"  IEEE  Trans.  Information  Theory,  IT- 
16:368-372,  July,  1970. 

[3]  R.  G.  Gallager.  Information  Theory  and  Reliable  Communications. 
Wiley,  New  York,  N.Y.,  1968. 


Figure  1:  The  Functions  4(a),  4(a)  and  4*(a)  and  the  allowable  region 
in  the  x  -  H  plane 
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Generalized  Cutoff  Rates  and  R^nyi’s  Information  Measures 
I.  Csiszar  (Budapest) 

Renyi’s  entropy  and  divergence  of  order  a  are  given  oper-  Definition.  The  ^-cutoff  rate  is 
ational  characterizations  in  terms  of  block  coding  and  hypoth-  source  coding,  the  smallest  resp.  largest  s  such 

esis  testing,  as  so-caUed  ;9-cutoff  rates,  with  a  =  1/1  +  for  ’’  >  block-length  n 

entropy  and  a  =  1/1  —  for  divergence.  Out  of  several  pos-  probability  of  error 


sible  definitions  of  mutual  information  of  order  a  (for  channel 
W  and  input  distribution  P)  we  adopt 

I^{P,W)  =  mmX;P(x)D„(W(-|x)||e). 

Z 

This  admits  interpretation  as  a  /9-cutoff  rate,  with  a  =  1/1  —  ^ 
(at  least  for  a  >  1/2),  and  so  does  maxp  Ia{P,  IV),  the  “Renyi 
capacity.” 

Geometrically,  the  ;9-cutoff  rate  for  a  discrete  memoryless 
source  or  channel  is  the  r-axis  intercept  of  the  tangent  of  slope 
to  the  curve  e(r),  where  e(r)  is  the  exponent  of  the  prob¬ 
ability  of  error  resp.  of  correct  decoding  for  the  best  codes  of 
rate  r,  according  as  r  is  an  achievable  rate  or  not.  The  ordi¬ 
nary  cutoff  rate  of  a  DMC  is  the  /9-cutoff  rate  with  /J  =  —  1. 
The  /9-cutoff  rate  for  hypothesis  testing  has  a  similar  geometric 
representation,  e(r)  being  the  exponent  of  convergence  of  the 
probability  of  type  2  error  to  0  or  1,  for  the  best  tests  of  sample 
size  n  -*  oo  with  probability  exp(— nr)  of  type  1  error. 

Summary 

Renyi  [1]  introduced  a  one-parameter  family  of  informa¬ 
tion  measures.  His  entropy  of  order  a  is 

E  ^  1)  (1) 

X 

and  the  divergence  of  order  a  is 

D<.(P||Q)  =  ^log53P“(x)Q>-“(z)  (a  ^  1).  (2) 

X 

In  the  limit  a  — »  1,  the  stemdeird  Shannon  entropy  and  Kullback- 
Leibler  divergence  are  recovered.  Shannon’s  mutual  informa¬ 
tion  has  several  equivalent  definitions  whose  “order  a”  exten¬ 
sions  are  no  longer  equivalent.  Our  results  support  the  follow¬ 
ing  definition  of  mutual  information  of  order  a,  for  a  channel 
W  with  input  distribution  P: 

UP,m  =  inf  ^P(x)D<.(iy(-|x)||<3).  (3) 

X 

Renyi’s  information  measures  enter  useful  error  probabil¬ 
ity  bounds,  as  observed  by  seversJ  authors.  Still,  few  results 
are  available  that  actually  identify  operationally  defined  quan¬ 
tities  with  information  meMures  of  order  a.  One  such  result, 
due  to  Campbell  [2],  characterizes  entropy  of  order  o  in  terms 
of  exponential  mean  length  of  variable  length  codes. 

In  this  paper,  we  give  operational  characterizations  of 
Renyi’s  information  measures  (1),  (2),  (3)  in  terms  of  block 
codes  and  hypothesis  tests,  analogous  to  the  familiar  charac¬ 
terizations  of  the  standard  information  meMures.  To  this  end, 
we  introduce  the  concept  of  /3-cutoff  rates,  generalizing  the 
well-known  concept  of  cutoff  rate  of  a  DMC  that  corresponds 
to  ^  =  -1. 

I.  Csiszu  is  with  the  Mathematical  Institute  of  the  Hungarian 
Academy  of  Sciences,  H-1364  Budapest,  FOB  127,  Hungary. 
This  research  was  supported  by  the  Hungarian  National  Foun¬ 
dation  for  Scientific  Research,  Grant  1906. 


Pt  <  exp{n^(s  —  r)  +  o(n)}  resp. 

p«  >  1  -  exp{n^(s  -  r)  +  o(n)},  (4) 

according  m  /9  >  0  or  /3  <  0. 

(ii)  for  hypothesis  testing:  the  largest  resp.  smallest  s 
such  that  for  every  r  >  0,  the  best  tests  of  sample 
size  n  and  probabihty  of  type  1  error  <  exp(— nr) 
have  type  2  error  pe  satisfying  (4),  according  as  /9  <  0 
or  /9  >  0. 

(iii)  for  channel  coding:  the  largest  resp.  sm2dlest  s  such 
that  for  every  r  >  0,  the  best  codes  of  sample  size  n 
and  rate  r  have  average  probability  of  error  p,  satis¬ 
fying  (4),  according  as  /9  <  0  or  ^  >  0. 

(iv)  for  channel  coding  with  a  fixed  input  distribution  P: 
same  as  in  (iii),  but  the  codes  are  required  to  have 
codewords  of  the  same  type,  approaching  P  as  n  —* 
oo. 

Theorem 

(i)  For  a  DMS  with  distribution  P,  the  /9-cutoff  rate 
equals  the  Renyi  entropy  (1),  with  a  =  1/1  -f-  for 
all /9  >  -l,/9  #  0. 

(ii)  For  testing  a  simple  hypothesis  P  against  a  simple 

alternative  Q,  the  ^-cutoff  rate  equab  the  Rmyi  di¬ 
vergence  (2),  with  ot  =  1/1  —  /9,  for  all  ^  <  1,^  0. 

(hi)  For  a  DMC  {W},  the  /9-cutoff  rate  with  fixed  input 
distribution  P  equals  the  Renyi  mutual  information 
(3),  and  the  /9-cutoff  rate  equals  maxp  /o(P,  W^),  with 
a  =  1/1 -,9,  for  all  -1  <  ^  <  l,/9^0. 

Remark.  The  “Rrayi  capacity”  maxp  /o(P,  W)  can  be  alter¬ 
natively  represented  as 

m^  log  ^  P(i)lF“(y|i)^ 

or  as  “information  radius  of  order  or” 

imnmaxDa(W(-|x)||Q). 

This  Theorem  is  a  straightforward  consequence  of  well 
known  results  on  error  exponents  available,  e.g.,  in  Csiszar  and 
Korner  [3].  The  reason  the  author  still  considers  this  Theorem 
remarkable  is  that  it  appears  to  be  the  first  natural  and  unified 
operational  characterization  of  Renyi’s  information  measures. 
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Abstract 

We  prove  a  genersUtation  of  the  Entropy-Power  Inequality, 
and  of  the  Fisher-Infonnation  Inequality,  to  multi-dimensional 
linear  transformation  of  a  vector  with  independent  components, 
and  use  this  generaUsation  in  wveral  applications. 

Consider  the  (joint-)  differential-entropy  h(A£),  of  a  linear  trans¬ 
formation  jf  =  A£,  dim  A  =  m  X  n,  where  £  =  zi . . .  is  a  continuous 

random  vect<»  and  h{]i)  =  E{-log/(^}.  In  some  cases,  this  entropy 
is  easily  calculated  or  bounded.  If  A  is  an  invertible  matrix,  the  lin¬ 
ear  transformation  just  scales  and  shuffles  £,  thus  the  entropy  is  only 
shifted,  i.e.,  h{As)  =  h(£)-|-log  |A|,  where  |  •  |  denotes  absolute  value  <ff 
determinant.  If  A  does  not  have  a  full  row-rank,  then  h(A2)  =  -oo  , 
since  there  is  a  deterministic  relation  between  the  components  of  ][. 
If  2.  =  £*  is  a  Gaussian  vector,  the  linear  transformation  A  preserves 
the  normality  and  so  h(A£*)  =  ^log(2xe|AAzA‘|~),  where  is  the 
covariance  matrix  of  2*. 

In  the  above  three  cases  2  was  an  arbitrary  random  vector.  In 
what  follows  we  restrict  2  to  have  independent  components.  Suppose 
in  addition  that  y  is  scalar,  i.e.,  y  =  a|Zi  +  . . .  +  OnZn,  then  the 
entropy-power-ineqnality  (EPI)  can  be  used  to  lower  bound  its  entropy. 
Specificaffy,  by  one  of  the  equivalent  forms  of  the  EPI  (see  e.g.  [1]), 

h(n*t)  >  h(a!i)  (1) 

where  2  is  a  Gaussism  vector  with  independent  components,  such  that 
h(3i)  =  h(z,-),  t  =  1 . . .  n  and  ft*  =  (oi, . . . On).  Note  that  an  explicit 

calculation  of  the  entropy  in  the  RHS  of  (1)  yields 

A(a‘i)  =  5log2xe(a‘i»a)  =  i  log  2x6(5^  «*!»■)  (2) 

'  ^  i=i 

where  P  is  the  covariance  matrix  of  2,  i.e.,  it  it  a  diagonal  matrix 
whose  diagonal  values  are  the  entropy  powers  pt  =  xhe 

inequality  in  (1)  becmnes  equality  iff  2  is  Gaussian  (or  if  n  =  1). 

In  this  paper,  we  generalize  the  lower  bound  (1)  above,  to  the  case 
where  ^  may  be  a  vector,  and  show: 

Theorem  1  For  any  matriz  A  and  a  vector  2  vnth  independent  com¬ 
ponents, 

h{At)  >  h{Ai)  =  j  log(2xe|APA'|i)  .  (3) 

Equality  in  (3)  holds  if  2  it  Gaussian  or  if,  after  omitting  the  all¬ 
zero  columns,  A  becomes  invertible.  When  2  has  iJ.d.  components 
and  A  is  orthonormal  (i.e.,  |AA'|  =  1),  (3)  becomes  ^h(Ai)  >  h(z). 
One  implication  of  this  result  can  be  interpreted  as  an  increase  of 
the  entropy  per  degree  of  freedom  after  band-pass  filtering  of  a  white 
process.  As  the  entn^y  becomes  higher,  the  random  vector  becomes 
more  Gaussian.  A  specific  statement  of  this  phenomena  it  given  by 
the  foDowing  application  of  Theorem  1: 

Theorem  3  For  any  matrix  A  and  a  vector  ^  with  independent  com¬ 
ponents, 

^D( A2;  A2*)  <  .mw  7>(x,;  z?)  (4) 


where  m  =  Rani  A,  2)(y;y*)  =  /  /^log^  is  the  KuUback-Leibler- 
Distance  (KLD)  (or  ‘information  divergence*)  between]^  and^*,  and 
2*  denotes  a  Gaussian  vector  with  the  same  first  and  second  moments 
as  £. 

The  mutnal-infOTmation  between  any  pair  of  orthogcmal  projec¬ 
tions  of  a  white  Ganssisin  vector  is  zero  (since  they  are  independent). 
Fw  the  non-Ganssian  case,  we  use  Theorem  2  to  prove: 

Theorem  S  Let  2  =  Z] . . .  z„  be  a  vector  with  i.i.d.  sarnies  and  let 
Aix.  and  As2  he  two  orthogonal  projections  of  x.  such  that  Rank  A|  =  r 
and  Rank  A4  =  n  —  r,  then 

^/(A,2:  A42)  >  !>(*;**)  -  Jx>(Aa;  All*)  (5) 

where  ^/(Ai2;  A42)  **  the  mutual-information  (per  sample  of  An) 
between  the  two  projections. 

Note  that  if  ^^(Aii;  Aii*)  ss  0  (for  large  enough  n),  i.e.,  A|2 
approaches  normality  in  a  KLD  sense,  this  mutual-information  is  lower 
bounded  by  the  (positive)  KLD  of  z.  A  simple  example,  for  r  = 
1.  is  All  =  ^  Zi  (the  D.C.  component  of  2),  where,  by  the 
strong  form  of  the  Central- Limit-Theorem  of  [2],  J^(Aii;  Aii‘)  0  as 
n  -»  00  (actually,  tor  ‘hiice”  distributions  the  KLD  decreases  rapidly 
with  n).  Observe  that  the  mutual  information  between  the  orthogonal 
projections  of  the  non-Ganssian  white  nmse,  is  bounded  away  from 
zero,  in  this  example. 

Motivated  by  the  duality  between  EPI-type  inequalities  for  vari¬ 
ous  information  theoretic  measures  (see  [1]),  an  inequality  analogous  to 
(3)  is  derived  for  Fisher-Information  matrix.  Let  K(-)  =  /  jVf  •  Vf* 
denotes  the  Fisher-Information  matrix,  with  respect  to  a  translation 
parsuneter  of  a  random  vector  with  a  density  /,  where  V/  is  the  gra¬ 
dient  vector  of  /.  Then, 

Theorem  4 

A(A2)  <  A'(A2)  =  (aA-‘(2)A')’‘  (6) 

where  2  =  dj . . .  is  o  Gaussian  vector  with  independent  components, 
such  that  K(xi)  =  K(xi),  i  =  I . . .  n. 

The  matrix  inequality  (6)  is  in  the  sense  that  the  difference  matrix 
is  positive  semi-definite.  Equality  holds  under  the  same  conditions  as 
in  theorem  1. 
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A  new  approach  to  rate-distortion  computation  and 
analysis  is  suggested  in  this  work.  We  shall  restrict  our  atten¬ 
tion  here  to  continuous  amplitude  memoryless  sources.  Much 
of  the  existing  theory  is  formulated  in  terms  of  optimization 
over  the  output  density,  particularly  the  results  leading  to  the 
Blahut  algorithm.  In  the  new  approach  we  consider  a  map¬ 
ping  from  the  unit  interval  with  the  Lebesgue  measure,  to  the 
output  space.  Instead  of  optimizing  the  output  density  di¬ 
rectly,  we  optimize  this  mapping.  The  theoretical  equivalence 
of  the  mapping  approach  (MA)  to  the  tradition2il  approach 
is  intuitively  obvious  but  can  be  formally  shown  by  isomor¬ 
phism  theorems  for  topological  measure  spaces.  Although 
equivalent  in  principle,  the  MA  formulation  is  different,  and 
by  deriving  the  results  from  this  angle,  some  new  insights 
are  gained,  as  well  as  a  more  efficient  numerical  approach  to 
compute  the  rate-distortion  function. 

First,  the  mapping  approach  is  presented  and  its  equiv¬ 
alence  to  the  usual  approach  is  discussed.  The  optimality 
conditions  are  derived  and  are  shown  to  be  random-coding 
relatives  of  the  Lloyd  optimality  conditions  for  optimal  quan¬ 
tizer  design. 

Next,  the  MA  formulation  is  used  to  prove  that,  for 
the  squared  error  distortion,  the  optimizing  output  density 
is  purely  discrete  as  long  as  the  rate- distortion  function  has 
not  merged  with  the  Shannon  lower  bound.  In  other  words, 
except  for  the  case  that  the  bound  i.-  attained  (e.g.,  Gaus¬ 
sian  source  for  all  positive  rates),  the  output  density  consists 
of  singularities.  This  could  explain  why  explicit  expression- 
s  for  the  rate-distortion  function  are  so  hard  to  obtain.  In 
a  paper  that  recently  came  to  my  attention  [1],  it  is  shown 
(using  a  different  approach)  that  the  optimizing  output  is 
discrete  if  the  source  density’s  support  is  not  the  entire  s- 
pace.  This  result  is  a  special  case  of  our  result  here,  as  for 
such  sources  the  Shannon  lower  bound  is  strictly  lower  than 
the  rate-distortion  function  at  all  nonzero  distortions. 

We  then  address  the  analysis  of  the  evolution  of  the  op¬ 
timizing  output  densities  as  we  decrease  the  distortion  (that 
is,  as  we  “crawl  up”  the  rate-distortion  curve).  Here  we  s- 
tart  by  showing  that  the  functional  that  is  minimized  to  find 
the  optimizing  density  is  the  free  energy  of  an  appropriately 
defined  statistical  mechanics  system.  The  slope  parameter 
is  simply  related  to  the  temperature  in  the  physical  analogy, 
and  the  optimizing  output  density  is  given  by  the  isother¬ 
mal  equilibrium  distribution  at  the  given  temperature.  Thus, 
“crawling  up”  the  rate-distortion  curve  is  simply  an  anneal¬ 
ing  process  in  statistical  mechanics.  The  analysis  shows  that 
the  annealing  process  starts  with  a  single  output  symbol  at 


R  =  0,  and  consists  of  a  sequence  of  phase  transitions  which 
increase  the  number  of  symbols  (or  singularities)  by  splitting 
them.  It  is  shown  that  the  last  phase  transition  occurs  when 
the  rate-distortion  curve  hits  the  Shannon  lower  bound,  and 
where  the  singularities  split,  or  rather,  explode  into  a  con¬ 
tinuous  distribution. 

Finally,  we  discuss  the  applicability  of  the  mapping  ap¬ 
proach  to  practical  computation  of  rate-distortion  functions. 
Discretization  for  numerical  computation  results  in  an  al¬ 
gorithm  whose  performance  differs  from  that  of  the  Blahut 
algorithm  (BA).  BA  optimizes  over  a  grid  of  points  in  the  out¬ 
put  space  to  obtain  an  approximate  solution  (whose  quality 
depends  on  the  resolution  of  the  grid).  MA  uses  the  mapping 
which  adapts  its  effective  grid  to  the  source  distribution  and 
so  is  more  efficient.  Moreover,  as  long  as  the  Shannon  lower 
bound  is  not  attained,  the  optimal  density  is  discrete  (and 
usually  finite)  so  that  few  variables  allow  MA  to  find  the  ex¬ 
act  solution,  which  BA  approximates  using  the  entire  grid. 
Note  also  that  once  the  Shannon  lower  bound  is  attained  we 
can  explicitly  derive  the  solution,  so  numerical  evaluation  is 
no  longer  necessary.  The  MA  based  algorithm  is  closely  re¬ 
lated  to  our  VQ  design  method  [2].  Another  relative  is  [3] 
where  the  derivation  is  constrained  to  a  given  alphabet  size. 
The  MA  method  allows  the  number  of  symbols  to  grow  as 
necessary  to  obtain  the  unconstrained  rate-distortion  result. 
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Zipfs  law  is  a  famous  empirical  law  riiat  is  observed  in  the  behavior  of 
many  com|dex  systems  or  suiprisingly  differem  natuie.  Zipf  [1]  found 
a  lemaikaUe  rank-fiequency  relatioi^p  in  linguistics.  If  we  consider  a 
long  text  and  assign  nuiks  to  all  words  that  occur  in  the  text  in  the  order 
of  decreasing  frequencies,  then  the  frequency  f ,  of  a  word  satisfies  the 

empirical  law:  f,  =  cr~^  ,  where  c  and  ^  are  constants  and  p  >  1.  Zipfs 
law  has  been  discovered  independently  in  such  diverse  situations  as 
distribution  of  biological  species,  distribution  of  income,  distribution  of 
city  populations,  etc.[2] 

Most  theoretical  exi^anations  of  Zipfs  law  are  based  on  the  priiKiple  of 
the  "least  effort",  "minimum  cost"  [3],  "minimum  energy"  [4],  or  on 
other  very  qrecific  assumptions  which,  in  our  opinion,  csdl  for  hirther 
explanations. 

This  paper  presents  a  model  of  the  development  of  an  evolutionary 
system  in  a  form  of  a  nonstatioruuy  branching  Markov  random  process. 
We  will  formulate  the  model  in  the  language  of  evolutionary  dynaitucs, 
though  it  can  be  reformulated  in  terms  of  demography,  linguistics,  etc. 


Ccmsider  an  ecosystem  consisting  of  populations  Nj(N)  (i=l,2 . A(N)) 

of  species  Sj,  where  A(N)  is  the  number  of  different  species  at  the  N-th 
step  of  the  process.  The  ecosystem  is  assumed  to  evolve  according  to  the 
following  rules: 

1)  At  the  (N-t-l)th  stq)  of  the  process  an  individual  of  species  Sj  is  bom 

with  probability  Pr(N,(N  + 1)  =  n,  -f  1 1  N^CN)  =  n,)  =  ^  n^  (1) 

Here  Nj(N)  is  the  random  variaUe  which  is  the  population  of  species 
time  N,  i=15,...A(N)- 

2)  The  probability  that  an  individual  of  a  new  species  Sa(nhi  ^ 
bom  at  the  (N-t-l)th  step  of  the  process  (probability  of  a  successful 
mutation)  is 

Pr(N*<NH.(N  +  l)  =  l  I  N*<„h.(N)=0)  =  c(N)  (2) 
Set  the  initial  condititms:  N,(l)  =  l;  A(l)=l  (3) 

A(N) 

Then  for  any  N,  2^Ni(N)=  N.  Formulae  (l)-(3)  defiiK  a  branching 

i-l 

Markov  process. 

We  will  analyze  the  behavior  of  the  expected  values  E(N,)  and  the 

average  frequencies  fj(N)  =  E(Ni(N))N''.  Consider  two  special  cases 
corresponding  to  two  Afferent  assumptions  about  the  mutation  rate. 

1.  c(N>=c=const,  c«l  (4) 

Then,  the  expected  number  of  species  at  step  N  is 

E(A(N))=l+(N-l)c  (5) 

Calculation  of  the  explicit  expression  of  E(N,(N))  is  complicated  by 

the  fact  that  the  step  N*'’  when  the  species  s,  appears  is  a  random 
variable.  After  quite  an  intricate  derivation  we  obtain: 


(6) 


f  cN  V"'  c'^'N*' 

Hence, ford  and i»l:  E(N,(N))-(^^J  ,  f,(N)-^-^.  (7) 
which  is  Zipfs  law  (witii  the  exponent  slightly  smaller  than  1). 

2.  Now  assume  that  the  probability  of  mutation  leading  to  the  emer¬ 
gence  of  a  new  species  deonses  with  time: 

c(N)  =  bN"',  where  q«l.  (8) 


E(A(N))  =  -^N'-^ 
1-q 

For  large  ranks  i  the  ftequencies  are 


fi{N) 


r  b  1 

1-^ 
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_b 
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qN" 

(9) 


(10) 


This  is  also  Zipfs  law,  since  the  exponential  factor  is  almost  constimt  for 
b«l,  q«l.  For  example,  if  b=0.1,  q^.l,  tire  factor  changes  ftom  0.4S  to 
1  wha  i  changes  ^m  1  to  o».  In  contrast  with  case  1  (q=4),  now  the 
eiqxment  in  Zipfs  law  is  larger  than  1,  and  there  exists  a  counterpart  of 
the  thermodynamic  limit  (N  — »  «)  for  the  average  frequencies: 


fi  =  lim  fi(N)  =  [ exp  --f  — 

La-q)iJ  qla-q)ij 


(11) 


Let  us  address  now  the  question  of  the  CMnplexity  of  the  system 
described  by  our  model.  We  expect  intuitively  tiitt  a  "good  measure"  of 
complexity  should  reflect  both  "unpredictability"  aifo  "organization" 
(which  implies  memory)  in  the  betovior  of  a  complex  system.  We 
suggest  as  a  measure  of  complexity  at  time  N  the  mutual  information 


between  two  successive  states  of  the  system  S„  and  S„_, 


C„  =I(Sn;S,j.,)  =  H(S,j)  — H(S^|S^_,)  (12) 

This  measure  agrees  with  our  intuition  since  it  is  nonnegative  and 
vanishes  for  both  extreme  cases  of  chaotic  (i.e.  memoryless)  systems 
and,  on  the  other  hand,  strictly  deterministic  systems. 


In  our  model  the  state  S,,  is  a  random  vector  with  a  random  number 
A(N)  of  components:  =  (N,(N),Nj(N),...,N;^,^(N))  (13) 


For  large  N  we  can  approximately  consider  random  variables  Nj(N)  as 
independem.  Then  in  case  1,  approximately, 

CN-^cN-(l-c)toN  (14) 

0 

Thus,  the  limit  complexity  per  one  component  of  the  system  tone 
species)  is 

C  It* 

C^  =  lim - 11 — - — nats,  (15) 

N-~E(A(N))  6  ^  ’ 

or  2.37  bits  per  species. 


Similar  analysis  in  case  2  gives  the  same  limit  complexity  per  species 
(specific  complexity)  for  q«l.  Apparently,  this  complexity  is 
characteristic  for  all  systems  which  obey  Zipfs  law  with  the  exponent 
close  to  1. 
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Then  the  expected  number  of  species  grows  slower  than  N: 
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Noiseless  diagnosis  hu  wide  applications  in  source  coding,  deci¬ 
sion  table  programming,  medical  diagnosis,  database  query  pro¬ 
cessing,  quality  assurance  in  manufacturing,  and  pattern  recog¬ 
nition.  In  this  paper,  we  consider  the  following  formulation  of 
such  a  problem.  Let  F  be  the  fault  of  a  system  which  takes 
value  in  n  =  {fi^i  =  0,  •  •  —  1},  and  p,-  be  the  probability  of 

occurrence  of  ft.  Let  T  =  {tj}  be  the  set  of  tests  available  for 
diagnosing  the  system,  and  Cj  and  Sj  be  the  cost  and  the  number 
of  possible  responses  of  respectively.  The  set  T  is  auffieient 
in  the  sense  that  it  can  distinguish  all  the  possible  faults  of  the 
system,  and  the  tests  in  T  are  noiseless  in  the  sense  that  for  a 
given  fault,  when  a  particular  test  is  applied,  the  response  of  the 
test  is  deterministic.  We  are  interested  in  a  testing  tree  which 
minimizes  the  expected  cost  to  identify  F,  whose  cost  is  denoted 
by  Cnuif  Special  cases  of  our  model  can  be  foimd  in  [l]-[3]. 

In  our  formulation,  it  is  usumed  that  only  one  fault  can 
occur.  We  note  that  this  assumption  is  by  no  means  restrictive, 
because  if  multiple  faults  can  occur,  we  can  always  regard  each 
possible  combination  of  faults  as  a  single  fault,  and  reformulate 
the  problem  as  a  single-favlt  problem. 

Define  the  efficiency  of  a  test  by 

c  ■ 

Since  the  number  of  possible  responses  of  tj  is  Sj,  the  rnuTriTniim 
amount  of  entropy  reduced  when  tj  is  applied  is  log  Sj.  This  is 
achieved  when  all  the  responses  are  equally  likely  immediately 
before  tj  is  applied.  Thus  Cj  gives  the  maximum  amount  of 
entropy  reduced  per  unit  cost  when  tj  is  applied. 

Assume  without  loss  of  generality  that  the  tests  in  T  are 
indexed  such  that 

ei  >  02  >  ■  ■ 

Now  define  a  mapping  p  :  ff"  as  follows.  First  define 

p(x)  for  the  values 

r 

=  53  OjCj, 

J=1 

r  =  0,1,2,  -  •  •  by 

p(i,)  =  ;^c,. 
j=i 

For  a  value  between  x,  uid  Xr+i,  the  value  of  p(x)  is  defined  as 
the  interpolation  of  p(xr)  and  p(xr+i).  Thus 

if") 

p(x)  =  53  +  (®  -  *-rC))/®T(«)-t-i 

where  7(x)  is  the  largest  integer  r  such  that 

r 

*^53*i‘=>- 

jml 


Theorem  1  (Lower  Bound)  For  any  testing  tree,  C  >  p{H). 

The  following  lemma,  which  is  of  fundamental  interest,  is 
instrumented  in  the  proof  of  Theorem  1.  Basically,  it  is  a  gener¬ 
alization  of  Shannon’s  entropy  bound  to  non-D-aiy  trees. 

Lemma  1  For  a  testing  tree,  define  the  descendancy  matrix 
(o<*l,  where 

{1  if  fi  is  a  descendant  of  non-terminal  node  k 
0  otherwise 

Let  be  the  number  of  branches  of  non-terminal  node  k.  Then 

53^13 ^ 

<  k 

Towetrd  obtaining  upper  bounds  on  Cmm,  we  introduce  the 
notion  of  irreducibiUty  of  a  sufficient  test  set. 

Definition  1  A  test  set  T  is  irreducible  ifTis  sufficient  and 
no  proper  subset  ofTis  sufficient. 

Lemma  2  If  an  irreducible  test  set  contains  a  test  with  d  pos¬ 
sible  responses,  the  size  of  the  test  set  is  at  most  n  —  d-i\. 

Theorem  2  Let  the  tests  in  a  sufficient  test  set  T  be  indexed 
such  that  S]  <  S]  <  ■  ■  ■ .  Then  the  size  of  an  irreducible  subset 
of  T  is  at  most  j',  where  j’  is  the  largest  integer  j  satisfying 

n-Sj  +  l>j.  (1) 

Theorem  3  (Universal  Upper  Bound)  Let  the  tests  in  a  suf¬ 
ficient  test  set  T  be  indexed  such  that  si  <  sz  <  •  ■  Then 
is  upper  bounded  by  the  total  cost  of  the  most  expensive  j*  tests 
inT. 

The  universal  upper  bound  on  Cmm  not  depend  on  {pi} . 
This  bound  is  particularly  useful  when  {pt}  is  unknown.  We  also 
obtain  a  refined  upper  boimd  on  C^in  which  depends  on  {p^}. 
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Abstract 

The  binary  input  A  -  user  adder  channel  models  a  communi¬ 
cation  media  accessed  simultaneously  by  N  users.  In  this  model 
each  user  transmits  binary  sequences  and  the  channel’s  output 
on  each  bit  slot  equals  the  sum  of  the  corresponding  N  inputs. 
A  uniquely  decodable  code  for  this  channel  is  a  set  of  N  codes 
—  a  code  for  each  of  the  N  users  —  such  that  the  receiver  can 
determine  all  possible  combinations  of  transmitted  codewords 
from  their  sum.  Van-Tilborg  presented  a  method  for  determin¬ 
ing  an  upper  bound  on  the  size  of  a  uniquely  decodable  code 
for  the  two-user  binary  adder  channel.  He  showed  that  for  suffi¬ 
ciently  large  block  length  this  combinatorial  bound  converges  to 
the  corresponding  capacity  region  boundary. 

In  the  present  work  we  use  a  similar  method  to  derive  an 
upper  bound  on  the  size  of  a  uniquely  decodable  code  for  the  bi¬ 
nary  input  A-user  adder  channel.  The  new  combinatorial  bound 
is  iterative  -  i.e.,  the  bound  for  the  (A  —  l).'asei  case  can  be  ob¬ 
tained  by  projecting  the  A-user  bound  on  (A  - 1)  combinatorial 
variables  and  in  particular  it  subsumes  the  two-user  resuit.  For 
sufficiently  large  block  length  the  A-user  bou-;d  converges  to 
the  capacity  region  boundstry  of  the  binary  input  A'-user  adder 
channel. 


Summary 

The  binary  input  A-user  adder  channel  is  a  discrete  mem¬ 
oryless  channel  accessed  by  A  users.  It  is  assumed  that  each 
user  transmits  binary  sequences,  bit  and  block  synchronism  is 
maintained,  and  the  channel’s  output  on  each  bit  slot  equals  the 
sum  of  the  corresponding  A  inputs.  A  uniquely  decodable  (UU) 
code  for  this  channel  is  a  collection  of  A  block-length  n  codes  - 

(Cl, Ca,  •  •  • , Cff)  -  such  that  the  sums  ci  -f  ca  H - (-  c^v  for  any 

(ciiCj)-")Cjv)  €  Cl  X  Ca  X  •••  X  Ck  are  different  This  enables 
the  receiver  to  uniquely  determine  all  possible  combinations  of 
transmitted  codewords  from  their  sum.  The  coding  problei  i  is  to 
find  a  UD  code  which  maximizes  the  product  |Ci|  ■  |Ca|  * ■  ■  |C;v| 
-  i.e.,  a  UD  code  having  the  maximum  rate-sum. 

Van  Tilborg  considered  the  binary  (A  =  2)  adder  chimnel 
[1,2].  He  showed  that  the  size  of  any  uniquely  decodable  block 
code  pair  (Ci,Ca)  of  length  n  is  upper  bounded  by 

^  E  (fc)  min  {  2*  ,  }  (1) 

Furthermore,  for  sufficiently  large  n  the  rate-sum  of  the  com¬ 
binatorial  upper  bound  on  (1)  converges  to  the  capacity  region 
boundary  of  the  binary  adder  channel. 


In  the  present  work  we  use  a  technique  which  resembles  Van 
Tilborg’s  method  to  derive  an  upper  bound  on  the  size  of  any  UD 
block  code  for  the  A-user  adder  channel.  We  prove  the  following 
result 


Theorem  :  Let  (C|,C2,-‘-,Cjv)  be  a  uniquely  decodable 
block  code  of  length  n  for  the  A-user  adder  channel.  Then 


tC,|.|C72|  •••  E 

‘i.** . *W-1 


max(2*‘,2*’>,---,2*''-‘  ) ,  2^— EilT* M  | 

(2) 


The  upper  bound  (2)  is  iterative  -  i.e.,  the  (A  —  l)-user  bound 
can  be  obtained  by  projecting  the  r.h.s.  of  (2)  on  a  subspace  of 
(A  —  1)  combinatorial  variables  (e.g.  by  setting  fcjv-i  =  0). 

Fo  V  =  3  the  bound  admits  the  form 

ic,M«|.|c.i  <  £ 

min  I  max  (2*, 2^)  ,  |  (3) 

which  yields  Van  Tilborg’s  result  (1)  upon  projection  on  the 
A  =  2  plane. 

The  asymptotic  behavior  of  the  rate-sum  corresponding  to 
the  r.h.s.  of  (2)  is  investigated.  We  show  that  for  sufficiently 
large  n  the  rate-sum  is  lower  bounded  by 

>  l  +  xlog,A.  (4) 

i=l  ^ 

The  lower  bound  (4)  is  identical  to  the  capacity  region  boundary 
lot  N  —  2  and  is  shown  to  be  fairly  close  to  the  capacity  region 
boundary  for  A  >  3. 
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Coding  technigues  for  the  synchronized 
multiple-access  binary  adder  channel  with  idle 
sources  are  studied.  Based  on  Lindstrom's 
co^inatory  detection  algorithm,  a  class  of 
uniquely  decodable  multiple-user  codes  is 
constructed.  The  rate  sums  of  these  codes  are 
asymptotically  equal  to  the  maximum  achievable 
values .  Each  constituent  code  has  a  zero 
vector,  and  two  nonzero  vectors  which  are  I's 
complement  of  each  other.  The  source  infor¬ 


mation  bits  "0"  and  "1"  are  encoded  by  using 
these  two  nonzero  complementary  vectors,  and 
the  idle  state  of  source  is  represented  by  the 
zero  vector.  This  approach  is  quite  similar 
to  the  direct-sequence  spread-spectrum 
multiple-access  method.  This  coding  mechanism 
will  provide  an  exploratory  methodology  to  fill 
the  gaps  among  random  access  collision  reso¬ 
lution,  multiple-user  information  theory  and 
spread  spectrum. 
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Abstract:  This  paper  addresses  the  construction  of  q-ary 

cyclic  codes  for  the  synchronous  noiseless  T-user  q-ary 
adder  channel  (T-OAC).  This  construction  is  adaptive  in  the 
sense  that  the  decoder  will  correctly  identify  any  t  active 
users.  OatsT,  and  correctly  recover  their  respective 
messages,  i.e.,  any  subset  of  t  active  users  (unknown  in 
advance  to  the  decoder)  will  be  uniquely  decoded.  A  very  low 
complexity  decoding  procedure  is  given  and  it  is  shown  that 
the  maximum  achievable  sum  rate  is  1. 

INTRODUCTION 

In  a  recent  paper  (11  Mathys  introduced  a  class  of  codes 
for  the  synchronous  noiseless  T  active  out  of  N  multiple- 
access  channel  which  is  a  discrete-time  real  adder  channel 
without  feedback  with  N  real  or  binary  inputs.  These  codes 
are  uniquely  decodable  and  have  a  sum  rate  that  approaches  1 
if  the  decoder  is  informed  of  which  T  or  less  users  are 
active.  That  sum  rate  is  reduced  to  a  value  of  at  most  1/2 
if  the  decoder  has  to  identify  the  subset  of  active  users 
which  in  this  case  is  limited  to  at  most  T/2. 

We  remark  that  two  desirable  properties  of  codes 
designed  for  a  code  division  multiple-access  (CDMA)  communi¬ 
cation  system  are  the  possibility  of  identification  by  the 
decoder  of  the  active  users  without  sacrificing  code  rate 
and  the  availability  of  a  low  complexity  decoding  procedure. 

In  what  follows  we  prove  a  theorem  which  allows  the  use 
of  q-ary  cyclic  codes  in  a  synchronous  noiseless  T-user  real 
adder  channel  in  such  a  manner  that  they  are  uniquely 
decodable.  The  maximum  sum  rate  achieved  is  1.  This  rela¬ 
tively  low  maximum  sum  rate  is  compensated  for  by  the  fact 
that  the  resulting  decoder  satisfies  the  two  desirable  prop¬ 
erties  mentioned  above.  ^ 

We  consider  the  factorization  of  x"-l  over  GF(q), 
assuming  that  n  and  q  are  relatively  prime,  which  we  denote 
as  (n,q)=l.  The  condition  (n,q)=l  implies  that  x  -  1  has  no 
repeated  irreducible  factors  over  GF(q).  Let  gjlxl.g^ix),... 

...g.^(x)  denote  a  set  of  T  polynomials  which  are  factors  of 

x"'-  1  and  are  pairwise  relatively  prime  over  GF(q).  We  note 
that  max  J]deg(g.(x)l=n.  Since  g.(x)  and  (x  -l)/g.(x)=h.(x). 
i 

IsisT,  are  relatively  prime  polynomials  and  g(x)  has  degree 
at  least  1,  it  follows  by  the  greatest  common  divisor 

theorem  for  polynomials  that  there  exists  3.(x)  such  that 

B.(x)h.(x)*l  mod  g.(x),  laisT.  (1) 

Let  us  assume  a  noiseless  synchronous  T-QAC.  Let  m.(x) 

denote  the  message  polynomial  for  user  i.  Let  h.(x)  be  the 

generator  polynomial  of  the  cyclic  code  allocated  to  user  i. 
The  codewords  of  user  i  are  generated  in  the  usual  manner  by 
computing  m.(x)h.(x)  but,  before  being  transmitted,  each 

codeword  is  multiplied  by  3.(x)  and  reduced  modulo  x  -1. 

Obviously  the  operations  of  encoding  and  multiplying  by 
p.(x)  can  be  done  simultaneously. 

THEOREM:  Let  . ^T  ^  blocklength  n  q-ary  cyclic 

codes  with  message  polynomials  m|(x),m2(x),..,m.j.(x)  and 
generator  polynomials  hj(x),h2(x),....h.j.(x),  respectively. 
Then  the  t-tuple  .  S*’  uniquely 

decodable  on  the  synchronous  noiseless  t-user  q-ary  adder 
channel  and  has  a  maximum  sum  rate  of  1  achieved  when  t=T. 
PROOF:  We  note  that  max  deg(m.(x))  «  deg(g.(x)I-l,  laisT.  By 

the  Chinese  remainder  theorem  for  polynomials  (2,pp.287-?8S) 
it  follows  that  the  polynomial  sum  r(x)  «  mj(x)hj(x)3j(x)+ 


over  GF(q),  and  UtsT,  is 
uniquely  determined  by  mjIx).m2(x)...,nLj.(x)  when  d.glm.(x))< 
<legf8j(x)).  l^i^t  and  deg(r(x))<n.  Therefore  it  follows  that 

r(x),  considered  as  a  real  sum  of  polynomials,  is  also 
uniquely  determined  by  mj(x),m2(x),..,m.j.(x).  Since  each  code 

C.  has  deglg.(x)]  information  symbols  and  max£  deg(g.(x)]=^, 

i 

it  follows  that  the  maximum  sum  rate  rate  is  1. 

D 

The  situation  when  m.(x)=0  may  be  confused  by  the  de¬ 
coder  with  the  situation  when  user  i  is  not  active.  Such  am¬ 
biguities  can  be  avoided,  for  example,  by  forbidding  the 
messages  m.(x)=0. 

To  decode  the  information  sent  by  user  i,  i.e.,  to 
extract  m.(x)  from  r(x),  we  simply  apply  the  Chinese 

remainder  theorem  In  reverse  order,  i.e.,  we  compute  over 
GF(q)  the  remainder  of  the  division  of  r(x)  by  g.(x),  Isist. 

Since  g.(x)  is  a  factor  of  myxlh^xlpjlxl,  isjst,  j*i. 

it  follows  that  r(x)  =  m.(x)h.(x)8.(x)  mod  g.(x).  However, 
111 

from  (1)  and  the  assumption  that  deglm.(x)l<deg(g.(x)]  it 
follows  that  r(x)=m.(x)  modg.(x). 

A  CLASS  OF  EQUAL  RATE  BINARY  CYCLIC  CODES 

Let  n=2"'-  1  be  a  Mersenne  prime.  It  is  well  known  that, 
except  for^x-1,  all  the  remaining  (n-l)/m  irreducible 
factors  of  X  -  1  have  degree  m.  We  can  therefore  construct  a 
class  of  equal  rate  binary  cyclic  codes  for  the  T-user 
binary  adder  channel  (T-BAC)  as  follows.  Let  T  be  a  divisor 
of  (n-l)/m,  i.e.,  Ts  =  (n-I)/m.  Let  g.(x),  IsisT,  be 

chosen  as  the  j^roduct  of  s  distinct  degree  m  irreducible 
factors  of  of  X  -  1  and  such  that  (g.(x),gyx))=l,  i*j.  The 

generator  polynomial  for  user  i,  denoted  by  h.(x),  is  as 

previously  defined.  The  binary  codes  constructed  when  T=2, 
i.e.,  codes  for  the  2-BAC,  in  general  do  not  satisfy  the 
sufficient  condition  for  unique  decodability  given  in  (3). 
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Abstract  Given  any  group  G,  the  G-adder  channel  is  the  chan¬ 
nel  with  T  inputs  taking  values  in  G  and  output  equal  to  the  sum 
(over  G)  of  the  inputs.  An  F-adder  channel  is  a  G-adder  channel 
where  G  is  the  additive  group  of  a  finite  field  F.  Similarly,  the  R- 
adder  channel  is  the  one  corresponding  to  the  usual  field  R  of  real 
numbers.  The  Gaussian  multiple-access  channel  is  the  cascade  of  the 
/i-adder  channel  with  the  (single-user)  additive  white  Gaussian  noise 
channel.  Multiple-access  multiple-rale  codes  for  F-adder  channels 
are  defined  and  two  constructions  for  such  codes  are  given.  In  order 
to  use  such  codes  on  the  Gaussian  multiple-access  channel,  the  latter 
is  decomposed  into  a  number,  say  I,  of  F-adder  channels.  This  is 
done  via  a  construction  involving  a  lattice  with  sufficient  coding  gain 
to  reduce  the  error-probability  to  a  negligible  value  and  sublattices 
of  it  by  means  of  which  we  form  j  suitable  chain  of  finite  quotient 
groups.  The  multiple-access  codes  described  are  well  suited  for  use 
with  random-access  protocols  with  multiple  reception. 

SUMMARY 

In  this  paper  we  present  a  new  approach  to  construct  multiple- 
access  codes  for  F-adder  channels,  for  the  R-addcr  channel,  and  for 
the  Gaussian  multiple-access  channel  (sec  the  Abstract  for  the  defi¬ 
nition  of  these  channels). 

The  first  part  of  our  paper  focuses  on  F-adder  channels.  We 
begin  by  introducing  a  set  of  definitions  that  we  deem  convenient 
for  an  algebraic  approach  to  coding  for  F-adder  channels.  Then  two 
constructions  leading  to  multiple-access  multi-rale  codes  with  sum- 
rale  constraint  are  given. 

Let  R'  =  K'/N  be  the  largest  rale  needed  at  node  i,i  =  1 , 2,  •  •  • ,  T, 
where  N  is  the  blocklength  common  to  all  codes.  A  multi-rate  code 
V'  for  node  i  is  a  set  of  if*  -t- 1  linear  block  codes,  one  for  each  rate  r 
in  the  set  W  :=  {k/N  ■.  k  €  {0, 1,  •  •  • ,  /f'}}.  The  need  for  multi-rate 
codes  derives  from  the  source  model  which  is  assumed  to  be  bursty 
(see  Gallager’s  comments  below). 

A  multiple-access  multi-rate  eode  is  a  set  of  T  (one  for  each  channel 
input  node)  multi-rale  codes.  The  multiple-access  multi-rate  codes 
obtained  via  our  constructions  have  sum-rate  constraint  in  the  sense 
that  decoding  upon  observation  of  the  received  sum-word  is  success¬ 
ful,  provided  that  the  sum-rate  satisfies  E/=ir'  <  R,  where  r'  is 
the  rale  of  the  linear  block  code  used  at  node  i  and  R  is  a  design 
parameter  not  exceeding  I . 

In  the  second  part  of  our  talk  we  focus  attention  on  the  Gaus¬ 
sian  multiple-access  channel.  By  means  of  T  “modulators’’  and  a 
“demodulator,”  we  decompose  the  Gaussian  multiple-access  channel 
into  a  number,  say  I,  of  independent  F-adder  channels  that  are  used 
as  described  in  the  first  part  of  our  presentation.  The  decomposition 
procedure  can  be  summarized  as  follows.  We  start  with  an  appro¬ 
priate  lattice  So  as  input  signal  set  to  to  transform  the  Gaussian 
multiple-access  channel  into  a  virtually  error-free  F-adder  channel 
(See  e.g.  [Ij).  FYom  .Su  ^nd  any  sublattice  5]  one  obtains  a  quotient 
group  So/S\.  We  assume  that  the  choice  of  So  and  S\  are  such  that 
So/Si  is  isomorphic  to  the  additive  group  of  a  finite  field  F.  This 
allows  us  to  decompose  the  ft-adder  channel  into  an  F-adder  channel 
and  an  independent  residual  F-adder  channel  with  inputs  in  S|.  The 


procedure  can  be  repeated  with  the  residual  channel,  provided  that 
the  F-adder  channel  is  used  to  transmit  codewords  of  a  multiple- 
access  code  over  F.  If  no  power  constraint  is  given,  this  procedure 
can  be  repeated  indefinitely.  However,  If  a  power  constraint  is  given, 
one  has  to  stop  after  a  finite  number  of  steps  1,  namely  when  the  only 
element  of  Si  that  satisfies  the  power  constraint  is  the  zero  element. 

Our  approach  addresses  the  criticism  raised  by  Gallager  [5,  page 
I24j  when  he  observes  that;  “(There  are]  three  bodies  of  research  on 
multiaccess  channels,  each  proceeding  in  virtual  isolation  from  the 
others  and  each  using  totally  different  models.”  Gallager  refers  to 
the  fact  that  the  research  on  multiple-access  chaimels  has  concen¬ 
trated  either  on  the  bursty  arrival  of  messages  (collision  resolution 
research)  or  on  the  noise  and  interference  aspects  of  the  multiple- 
access  chaimel  (information  theoretic  approach)  but  not  on  both. 
The  information  theoretic  approach  does  not  take  into  account  the 
source  model  since  one  generally  assumes  sources  producing  informa¬ 
tion  at  some  average  rate.  Unfortunately,  in  order  to  see  this  average 
rale  one  has  to  smooth  out  the  source  by  averaging  over  a  long  time. 
This  introduces  unacceptable  delays.  Our  approach  addresses  both 
aspects.  In  particular,  the  problem  of  bursty  arrival  of  messages  is 
addressed  by  having  multi-rate  codes.  If  it  the  source  model  is  such 
that  one  cannot  guarantee  that  the  sum-rate  constraint  is  fulfilled  at 
all  times,  like  when  the  arrival  statistic  is  Poisson,  then  one  can  use 
our  multiple-access  multi-rate  codes  with  a  random-access  protocol 
with  multiple-reception  as  described  in  (2]  -  14). 
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Abstract 

A  new  family  of  uniquely  decodable  binary  codes  is 
presented  for  the  T>user  real  adder  channel.  The 
codes  consist  of  T  individual  codebooks,  each  con¬ 
taining  only  two  codewords,  one  of  which  is  the  all¬ 
zero  sequence.  These  codes  achieve  a  sum  rate  that  is 
equal,  asymptotically  in  T,  to  the  sum  capacity.  An 
iterative  decoding  algorithm  is  also  presented.  Appli¬ 
cations  are  discussed  to  codes  for  T  active  users  out 
of  M  potential  users,  and  to  superimposed  codes. 

Summary 

The  T-user  real  adder  channel  is  a  multiple  access 
channel  with  output  y  =  4-  *a  -f  . . .  -^  xt,  which 

is  the  real  sum  of  a  set  of  binary  input  symbols,  one 
from  each  of  T  users.  A  Tenser  code  for  this  channel 
is  a  set  (Ci,  C], . . . ,  Ct)  of  binary  block  codes  having 
a  common  length  N.  The  sum  rate  is  R,um{T)  = 
Ri  +  R2  +  •••  +  Rt,  where  Ri  is  the  rate  of  the  code 
C(.  A  T-user  code  is  uniquely  decodable  (UD)  if  all  of 
the  sums,  formed  by  taking  one  codeword  from  each 
user,  are  distinct. 

Chang  and  Weldon  [1]  showed  that  the  sum  ca¬ 
pacity  of  this  channel  satisfies 

C,um{T)  fu  (1/2)  logj  T  as  T-*  +00  .  (1) 

They  also  presented  a  family  of  UD  T-user  codes  for 
which  Rjum(^)  as  T  ►  -}-oo. 

Our  investigation  of  T-user  codes  for  the  real  adder 
channel  is  motivated  by  an  interest  in  codes  for  T 
active  users  out  of  M  [2],  and  superimposed  codes 
[3],  for  which  the  present  codes  can  serve  as  building 
blocks.  In  this  application,  the  codes  of  [1]  are  inad¬ 
equate  because  the  vast  majority  of  users  have  only 
non-zero  codewords,  so  these  users  must  be  active  at 
all  times. 

We  therefore  consider  T-user  codes  (Ci,...,Ct) 
of  the  following  form:  each  code  Ci  consists  of  only 
two  codewords,  one  of  which  is  the  all-zero  sequence. 
A  T-user  code  of  this  form  can  be  described  by  a  TxN 
binary  yenerafor  matrix  B,  the  row  of  which  is  the 
nonzero  codeword  of  C,. 

Supported  in  part  by  ARO  Grant  DAAL03-89-K-0130. 


We  specify  a  family  of  T-user  codes  by  recursively 
constructing  the  corresponding  generator  matrix.  The 
first  code  is  the  trivial  single-user  code  with  Bo  =  [l] 
and  To  =  JVo  —  1.  The  rule  for  constructing  Bj  from 
Bj-i  is 


1 

■  Bj.r 

Bj-y 

Ot,-i 

II 

eq' 

Rj-i 

Bj-i 

h-r 

Of-i 

OjV,-! 

L  o’s,.i 

1 

(2) 

where  Hj-i  is  the  one’s  complement  of  B;-i,  Ij~i 
is  the  identity  matrix  of  dimension  Nj-i,  Oj-i  is  a 
square,  all-zero  matrix  with  dimension  IV, -1,  Os  is 
the  all-zero  iV— tuple,  and  Ijv  u  the  all-one  IV— tuple. 

Theorem:  For  any  positive  integer  j,  the  matrix 
Bj  in  (2)  defines  a  uniquely  decodable  binary  T,  'User 
code  of  blocklength  Nj ,  where 

Tj  =  (i  +  1)^’  and  JV,  =  2^  +  *  -  1  .  (3) 

O 

Note  that  as  2)  — »  +00 

^«ufn  {Tj)IC,'um{'Bj)  — ♦  1.  (4) 

We  also  present  an  iterative  decoding  algorithm 
and  a  brief  description  of  new  T-out-of-M  user  codes 
and  superimposed  codes  which  can  be  constructed 
from  the  T-user  codes. 
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Abstract.  A  two-decodable  coding  scheme  for  the  two- 
user  binary  adder  channel  is  proposed,  where  the  first  code  of 
the  code  pair  is  constrained  to  a  class  of  linear  codes. 


1.  Introduction 

This  paper  deliberates  on  how  to  construct  a  two- 
decodable  code  pair  {C,S)  for  the  two-user  binary  adder 
channel.  We  use  the  fact  that  when  the  first  code  C  is  given 
a  priori,  a  maximum  independent  set  of  the  ^-order  graph 
associated  with  C  achieves  the  highest  rate  of  the 
second  code  S,  which  is  proposed  by  Kasami  and  Lin  in 
1983[1]. 

For  a  restricted  model  of  C,  it  is  possible  to  evaluate  a 
lower  bound  of  the  independence  number  of  and  to  pro¬ 
pose  a  practical  construction  scheme  of  the  two-decodable 
code  pair.  The  two-order  graph  associated  with  C  is 
decomposed  into  layers,  each  of  which  consists  of  mutually 
isomorphic  subgraphs.  It  is  easy  to  calculate  their  indejren- 
dence  numbers[3).  The  sum  of  the  independence  numbers  of 
non-adjacent  subgraphs  is  the  lower  bound  which  we  will 
propose. 


2.  Lower  bound 


The  code  C  to  be  considered  here  is  an  {n,k)  linear  code 
with  a  generator  matrix  r  =  [/jPi  •  •  •  Pj],  where  is  a  ifcxfe 
identity  matrix,  and  Pj  (j=l,  •  ■  •  ,A:)  is  a  kxxj  matrix  with 
all  entries  of  1  in  the  y-th  row  and  entries  of  0  elsewhere,  and 

Xi+X2+  ■  ■  •  A-Xf^-n-k,  iy>0(2]. 

For  the  2-order  graph  G^\  its  verte?,  set  F={0,1}"  can 
be  divided  into  the  partition  V  =  FqUKiU  •  ■  •uV2.-»_,  , 
where  i-0,1,  •  •  ■  ,  2”~*-l  ,  is  a  coset  of  C.  Let 
Lj=0...0aliK..ajJ^^  ■  •  ■  be  a  coset  leader  of  K,. 

Define  (?,  =  {!  |  ajp...a^j^0 , 1=1,  •  ■  ■  ,k}.  Let  Fj  be  a  sub¬ 
graph  of  induced  by  V^.  The  subgraph  belongs  to  the 
I  Qji’th  layer,  and  its  independence  number  is  equal  to  2**^'^ 


Put  a  set  m,  =  {'"iH  '  '  '  where  the  m,Y  is  the 

number  of  "1"  in  the  block  ajP...ajlK  The  number  of  Fj's 
such  that  m^=mj,  J=0,1,  •  •  ■  ,  2"~*-l  ,  is  caculated  as 


k 

n 

(=1 


.  In  order  to  select  mutually  non-adjacent  F/s,  let 


M  be  a  subset  of  {m;  ,  i=0,l,  •  •  •  ,2"  *-1}  such  that 
satisfy  one  of  the  following  two  Ccwes: 

k 

(a)  If  ,  1=1,  ■  ■  ■  ,k,  then 

(=1 

(b)  If  there  exists  an  index  p  such  that  ”i,p-(-mjj,  =  Zj,, 
then  ^ 

The  number  of  non-adjacent  F,’s  depends  on  M.  The  above 
arguments  are  summarized  in  the  following  theorem. 

Theorem  :  Let  C  be  an  {n,k)  Unear  code  with  the  gen¬ 
erator  matrix  F,  then  a  lower  bound  of  the  independence 
number  of  G^^  is  given  by 


Q(Gj?))>maxS2'‘^‘'n<^(m.,) 

^  1=1  1=1 


^il 


where  </>(m;|) 


0.5  rfii,JXi=Q.h 
1  otherwise 


□ 


3.  Conclusion 

The  construction  scheme  of  the  two-decodable  code  pair 
is  proposed  as  foUows:  Let  C  be  an  (n,k)  linear  code  with  the 
generator  matrix  F,  and  choose  out  a  set  M  that  makes  an 
independent  set  larger.  This  independent  set  is  the  code  S. 
Thus  the  two-decodable  code  pair  { C,S)  is  obtained.  It  is 
confirmed  that  there  exist  generator  matrices  F '  s  such  that 
the  lower  bounds  are  equal  to  q:(G^^). 

The  authors  are  grateful  to  Dr.  H.  Harada  for  his 
instruction  in  this  paper. 
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Abstract 

A  method  of  transmitting  information  through  an  AWGN 
multiple  access  binary  adder  channel  (BAC)  will  be  addressed.  We 
shall  consider  the  following  procedure  of  access  to  the  channel: 
there  are  N  users  but  only  m,  m  <N,  users  are  active  (are 
transmitting  their  messages)  during  a  fixed  period  of  communica¬ 
tion;  the  transmission  is  completely  synchronized;  the  subset  of 
the  active  users  is  known  to  the  receiver.  Such  situation  will  be 
named  transmission  through  BAC  with  a  partial  access  (BACPA). 
If  N  »m  then  time  sharing  is  a  very  ineffective  method  for  the 
BACPA.  Indeed,  since  the  subset  of  active  users  is  unknown  to 
each  of  the  users,  the  time  must  be  shared  between  all  N  users. 
Consequently,  the  overall  transmission  rate  is  given  by  = 
m 

•j^R  «1,  where  R  is  the  coding  rate  of  each  user.  We  shall  show 

that  effective  transmission  through  a  BACPA  can  be  realized  by 
means  of  linear  codes.  More  specifically,  we  shall  show  that  for 
any  noiseless  BACPA  it  is  possible  to  construct  N  linear  codes 
such  that  =  I  and  the  decoding  error  probability  equals  0. 
For  the  case  of  AWGN  BAC,  we  shall  show  that  transmission  by 
means  of  linear  codes  can  have  even  better  characteristics  than 
time  sharing. 

Summary 

The  following  situation  of  transmitting  information  through 
a  multiple  access  channel  is  addressed.  A  binary  real  adder  chan¬ 
nel  (BAC)  perturbed  by  additive  white  Gaussian  noise  (AWGN)  is 
considered.  The  BAC  can  be  described  as  follows:  the  inputs  cor¬ 
responding  to  the  users  are  binary,  i.e.,  the  input  alphabet  of  each 
user  is  X  =  {-1,1};  the  output  of  the  channel  at  instance  J  is 
m 

equal  to  y  =  ^  x/  +  zi ,  where  m  is  the  number  of  active 
i=l 

users,  xJ  is  the  input  signal  of  the  ith  user  at  instance  j  and  zi ; 
y=l,2,...  ,  are  iid  Gaussian  random  variables.  We  shall  consider 
the  following  procedure  of  access  to  the  channel:  there  are  N 
users  but  only  m,  m  </V,  users  are  active  (are  transmitting  their 
messages)  during  a  fixed  period  of  communication;  the  transmis¬ 
sion  is  completely  synchronized;  the  subset  of  the  active  users  is 
known  to  the  receiver.  Such  situation  will  be  named  transmission 
through  BAC  with  a  partial  access  (BACPA). 

We  shall  call  the  coding  for  BAC  for  the  case  N  =  m,  i.e., 
when  all  existing  users  are  simultaneously  active,  coding  for 
transmission  through  BAC  with  a  complete  access  (BACCA).  The 
simplest,  and  usually  used,  method  of  transmission  through 
BACCA  is  time  sharing.  In  that  case,  under  the  condition  that  all 
users  use  codes  with  the  same  coding  rate  R,  the  information 
,  R 

transmission  rate  of  each  user  is  equal  to  —  and  the  overall  rate 

^  m 

R„y  of  transmission  through  the  BACCA  is  given  by  R„„  =  R  < 
Cq  si,  where  Cq  is  the  capacity  of  the  AWGN  one-way  channel 
with  binary  input.  It  is  known  that  there  are  coding  methods  for 
the  BAC  for  which  >Cq  |I].  A  remarkable  fact  is  that  the 
capacity  of  the  AWGN  BAC  is  achieved  by  uniform  distribution 
on  the  inputs  of  the  users  (/>(jt=-l)=p(jt=l)=0.5).  Conse¬ 
quently,  the  capacity  of  the  BAC  can  be  attained  on  the  ensemble 
of  binary  linear  codes.  The  possibility  of  employment  of  linear 
codes  simplifies  considerably  the  coding,  and  frequently  also  the 
decoding.  It  should  he  noted  here  that  linear  codes  can  not  realize 


the  zero  error  of  decoding  probability  under  transmission  through 
the  BAC  without  noise  with  overall  rate  R„  >1  [2].  This  implies 
that  the  value  of  the  decoding  error  probability  for  linear  codes 
can  be  worse  than  for  nonlinear  ones.  But  the  simplicity  of  realiz¬ 
ing  the  coding  and  decoding  can  in  many  cases  be  an  acceptable 
price  for  such  deterioration. 

It  N  »m  then  time  sharing  is  a  ver;,  ineffective  method 
for  the  BACPA.  Indeed,  since  the  subset  of  active  users  is  unk¬ 
nown  to  each  of  the  users,  the  time  must  be  shared  between  all  .V 
users.  Therefore,  the  overall  transmission  rate  R^^  is  given  by 
m 

^ov  =  «1>  where  R  is  the  coding  rate  of  each  user.  We 

shall  show  that  effective  transmission  through  a  BACPA  can  be 
realized  by  means  of  linear  codes.  More  specifically,  we  shall 
show  that  for  any  noiseless  BACT’A  it  is  possible  to  construct  N 
linear  codes  such  that  R„y  =  I  and  the  decoding  error  probability 
is  equal  to  0. 

Bounds  on  the  number  of  users,  N,  for  a  given  m  will  be 
presented.  In  the  case  of  AWGN  BAC,  any  m  active  users  share 
the  same  binary  linear  (n,k)  code  C.  This  means  that  the  code  of 
any  user  is  either  a  subcode  of  C  or  some  coset  of  the  subcode. 
We  construct,  by  means  of  the  random  coset  method  (3),  an  upper 
bound  on  the  error  decoding  probability.  This  bound  enables  us  to 
show  that,  for  the  case  of  AWGN  BAC,  transmission  by  means  of 
linear  codes  outperforms,  with  respect  to  the  decoding  error  pro¬ 
bability,  time  sharing  schemes.  Parameters  that  play  a  significant 
role  in  determination  of  the  probability  of  decoding  error  will  be 
considered. 

References 


(IJ  R.  Alswede,  "Multi-way  communication  channels',  Proc. 
2nd  Int.  Symp.  Inform.  Theory,  Tsahkadsor,  Armenian 
S.S.R.,  (1971),  ;7;7.  23-52,  Publishing  House  of  the  Hungar¬ 
ian  Academy  of  Science,  1973. 

P)  T.  Kasami,  S.  Lin,  V.K.  Wei,  S.  Yamamura,  Tjraph  theo¬ 
retic  approaches  to  the  code  construction  for  the  two-user 
multiple- access  binary  adder  channel",  IEEE  Trans.  Infor¬ 
mation  Theory,  vol.IT-29,  no.\,pp.  1 14- 130,  Jun.  1983. 

{3]  G.  Poltyrev,  "About  improving  the  upper  bound  on  error 
decoding  probability  for  codes  with  complicated  structure", 
ProNemy  PeredaM  Informatii,  vol.23,  No.4,  pp.  5-18, 
1987. 


84 


CODING  FOR  THE  F-ADDER  CHANNEL:  TWO  APPLICATIONS  OF  REED 

SOLOMON  CODES 


Riidiger  Urbanke  Bixio  Rimoldi* 
Washington  University 
Department  of  Electrical  Engineering 
Electronics  Systems  and  Signals  Research  Laboratory 
St.  Louis,  MO  63130,  USA 


Abstract 

Given  any  finite  field  F,  the  F-adder  channel  is  the 
channel  whose  inputs  are  elements  of  F  and  the  output 
is  the  sum  (over  F)  of  the  inputs.  It  is  shown  how  Reed 
Solomon  (RS)  codes  can  be  used  to  obtain  multiple- 
access  multiple-rate  codes  of  convolutional  type  for  the 
F-adder  channel.  It  is  also  shown  that  when  the  F- 
adder  channel  is  noisy,  the  codewords  of  a  multipie- 
access  multi-rate  code  for  the  F-adder  channel  can  be 
protected  in  a  simple  and  flexible  manner  by  means  of 
RS  codes. 


I  Introduction 

In  a  companion  paper  [1]  multiple-access  multiple-rate  codes  for  the 
F-adder  channel  have  been  defined  and  two  constructions  of  such 
codes  have  been  given.  All  codes  in  [1]  are  of  block  type  (as  opposed 
to  convolutional  type).  This  paper  presents  two  applications  of  Reed 
Solomon  (RS)  codes  to  coding  for  the  F’-adder  channel.  The  first  ap¬ 
plication  results  in  multiple-access  multi-rate  codes  of  convolutional 
type.  This  is  done  in  section  II.  In  section  III  we  assume  that  the 
F-adder  channel  is  noisy  and  show  how  to  combine  multiple-access 
coding  and  error  protection  in  a  flexible  way. 

II  Convolutional  Codes  for  the  F-Adder  Channel 

In  [I]  we  explicitly  assumed  that  the  channel  is  the  F-adder  channel 
where  F  is  a  finite  field.  One  can  easily  verify  that  definitions  and 
constructions  still  apply  if  we  replace  F  hy  R  —  F[D],  the  ring  of 
polynomials  over  F' .  In  this  way.  generator  matrices  over  R  defin¬ 
ing  block  type  multiple-access  codes  over  R  can  be  seen  as  generator 
matrices  of  convolutional  type  multiple-access  codes^  over  F.  Since 
adding  elements  of  R  is  the  same  as  transforming  these  elements 
into  sequences  over  F,  adding  them  componentwise,  and  transform¬ 
ing  back  the  resulting  sequence  to  a  ring  element,  one  can  use  the 
convolutional  codes  obtained  in  this  way  as  multiple-access  codes  for 
the  F-adder  channel  (as  opposed  to  the  F[i?]-adder  channel). 
Example  1  Consider  the  following  7x3  generator  matrix. 


/I 


1 

D 

\  +  D 
D  +  D'^ 
l  +  D  +  D'^ 
l+D'^ 


1  \ 

D  +  D^ 

1  +  D^ 

D 

\  +  D 

l  +  D  +  D'^  ^ 


(1) 


The  elements  of  hT  are  over  the  polynomial  ring  R  =  F(£)]  with 
F  =  GF  (2).  Assume  that  7  users  are  sharing  a  bintuy-adder  chan¬ 
nel.  Assign  to  user  i  the  rate  1/3  convolutional  encoder  having  as 
generator  matrix  the  i-th  row®  of  hJ .  The  receiver  will  be  able  to  de¬ 
code  the  messages  upon  observation  of  the  channel  output,  provided 
that  no  more  than  3  users  are  active  and  provided  that  the  receiver 
knows  which  users  are  active.  This  is  true  since  any  3  rows  of  are 

*More  generally  R  could  be  a  commutative  ring  with  a  unit  element  with 
respect  to  multiplication  and  no  zero  divisors. 

^We  assume  that  the  information  sequences  have  finite  length,  so  that  they 
can  he  represente<l  hy  a  polynomial. 

^More  generally,  the  number  of  rows  assigned  to  users  may  vary  in  order  to 
account  for  users  having  disstinilar  rate  requirements. 


linearly  independent.  Notice  that  decoding  is  unique,  provided  that 
the  sum  rate  is  not  larger  than  unity.  Multiple-access  codes  having 
this  property  were  denoted  optimal  in  [1]. 

The  key  to  obtaining  optimal  multiple-access  codes  of  convolutional 
type  for  the  F-adder  channel  lies  in  the  ability  to  find  an  n  x  (n  —  fe) 
matrix  over  R  =  F[ZJ]  with  the  property  that  all  collections  of  (n  —  k) 
rows  are  linearly  independent.  The  values  of  n  and  (n  —  k)  are  design 
parameters  that  depend  on  the  number  of  users  and  on  how  many  of 
them  are  allowed  to  be  active  concurrently. 

Let  F  =  GF  ip’")  be  any  finite  field.  Let  E  =  GF  be  any  finite 
extension  field  of  F.  Let  n  divide  p'"'  —  1.  Let  hT  be  the  transposed 
parity  check  matrix  of  a  (n.  k)  RS  code  over  E.  Then  it  is  readily 
checked  that  hT  generates  an  optimum  convolutional  type  multiple- 
access  code,  if  we  view  its  elements  as  polynomials  over  F.  We  note 
that  the  specific  binary  example  presented  above  has  been  derived 
by  this  procedure  with  F  =  GF  (2),  1  =  3,  n  =  2®  —  1  and  k  =  4.  A 
similar  construction  method  for  such  matrices  has  been  proposed  in 
[2]  (there  the  motivation  was  to  find  channel  correcting  convolutional 
codes). 

Ill  Error  Control  Coding  via  RS  Codes 
Let  be  the  transposed  parity  check  matrix  of  a  (n,  k)  RS  code 
over  GFip’”).  Assume  that  we  want  to  construct  a  multiple-access 
code  to  operate  over  a  noisy  F-adder  channel  and  hence  need  an 
error  correcting  scheme  to  secure  the  multiple-access  codewords.  Let 
(n  —  k).  the  length  of  the  multiple-access  codewords,  divide  p”*  —  1, 
and  assume  that  we  want  to  be  able  to  correct  up  to  t  errors  per 
block. 

Partition  the  n  rows  of  into  generator  matrices  and  let  B'  be  the 
generator  matrix  of  user  i.  Let  c'  be  the  encoded  message  of  user  t. 
Before  transmission,  user  i  sets  the  last  2t  components  of  c'  to  zero 
^uld  takes  the  (inverse)  Fourier  transform.  Clearly  the  resulting  outer 
code  will  be  a  RS  code  with  error  correcting  capability  f  (the  last 

2f  frequency  components  of  c  are  zero).  This  scheme  works  because 

-  T 

the  transposed  parity  check  matrix  ft  of  a  (n,  k  -h  2t)  RS  code  can 
be  derived  from  the  transposed  parity  check  matrix  of  a  (n,  k) 
RS  code  by  deleting  the  last  2t  columns.  Hence  by  setting  the  last  2t 
components  of  c'  to  zero  and  tsiking  the  Fourier  transform  we  actually 
base  the  multiple-access  codes  on  a  (n,  k  -h  2t)  RS  code  and  embed 
this  in  an  outer  (n  —  k,n  —  k  —  2t)  RS  code  for  the  error  correction. 
Needless  to  say,  t  may  take  on  any  value  0  <  <  <  This  enror 
correction  scheme  does  not  decrease  the  number  of  available  rows  of 
h^.  but  reduces  the  maximum  possible  sum  rate  by  a  factor 
We  see  that  this  error  correction  schemes  allows  a  flexible  choice 
between  low  error  probability  and  high  sum  rate. 
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AbsiracL 

When  estimation  of  signal  waveform  or  signal  parameters  takes  place  under 
prior  uncertainty  as  to  whether  or  not  the  signal  is  actually  present  [p(Hi)  <  1,  with 
q(Ho)  >  0]  estimation  based  on  the  assumption  that  p(H  i)  =  1  can  result  in 
estimates  that  can  be  seriously  in  error.  Moreover,  such  estimators  and  estimates 
are  themselves  biased,  with  unknown  bias  if  the  “noise  only"  (q(Ho)  >  1)  state  is 
not  properly  accounted  for. 

The  present  paper  extends  the  original  analyses  of  Middleton  and  Esposito 
[la,lb],  and  more  recent  work  of  the  present  author  [2],  to  include  canonical 
estimation  in  generalized  noise  for  least  mean  square  error  (LMSE)  estimators  and 
for  (unconditional)  maximum  likelihood  estimators  (UMLE's),  which  last  were  not 
available  before  in  the  case  of  the  UMLE.  [In  addition,  the  verbal  presentation 
includes  new  threshold  results,  obtained  for  correlated  noise  samples  (the  author’s 
so-called  quasi-equivalent  (QE)  noise  models),  where  only  the  first  order  pdf  wi(z) 
and  covariance  of  the  noise  process  are  available  [2].] 


Summary; 

In  many  practical  signal  processing  situations  where  estimation  in  noisy 
environments  of  signal  waveform  (S)  or  parameters  (6)  is  required  (e.g., 
classification  and  localization  of  targets,  measurements,  remote  sensing,  etc.),  it  is 
often  not  known  a  priori  whether  or  not  the  desired  signal  is  present.  Detection  and 
estimation  are  then  jointly  required,  so  that  both  a  correct  estimator,  i.e.,  one  that  is 
unbiased  and  optimal,  can  be  constructed,  and  an  associated  (optimal)  detector 
employed,  which  in  turn  can  “validate"  (i.e.,  accept  or  reject)  the  estimate.  The  a 
priori  probabilities  ate  denoted  by  q^Ho),p(Hi),  withq+p=  l,0S(q,p)<  l,and 
the  debited  (optimum)  estimators,  y  ,  ate  denoted  by  Yp^j  h.^e,  with 

Yp<i  =  p6,  where  H  =  Ho  +  Hj,  for  these  estimators  to  be  unbiased. 

As  usual,  r^timality  is  defined  in  terms  of  minimum  average  “risk"  or  cost  [New 
canonieal  results  for  jointly  optimum  threshold  D  and  E  in  generalized  noise,  for 
both  coherent  and  incoherent  reception,  when  the  noise  sonnies  are  correlated,  ate 
discussed  in  the  presentation.] 

Furthermore,  we  consider  weak  coupling  only,  between  detector  and  estimator, 
with  the  added  simplification  that  in  detection  the  cost  of  declaring  the  signal 
absent  when  the  signal  is  present,  and  vice  versa,  is  independent  of  the  signal. 

Then  detection  and  estimation  can  be  carried  out  independently,  in  parallel,  with 
the  convention  that  the  (optimum)  estimator  y*  |  is  rejected  if  the  probability  of 
correct  detection  [Pd  =  p(Hi)(l-P)]  is  smallerman  some  preselected  value. 

Failure  to  account  for  the  fact  that  p(H))  <  I  can  have  serious  consequences  in 
applications  of  the  estimator; 

/.  Optimum  LMSE  Estimators,  p(H i)  <  1 

Here  we  shall  summarize  the  principal  results  recently  obtained  (see  also  [1], 

[2],  and  refs.)  for  the  quadratic  cost  function  (QCF),  which  leads  to  LMSE 
estimators.  We  have,  generally,  the  optimum  estimator 


.cf.(3.7.[l].  (1.1) 


where  Aj  is  the  generalized  likelihood  ratio; 


*P<i|qcf  - 

[i.aJ 

fp=l|QCI- 

Ajsp(F,{x|S(0)))g/F,(x|O);u 


P(Hi)  fx  =  y/Jiy  =  |xj}:J  =  MN 
<l(*^l)  j  =  (m,n) 


with  space  (m,M)-titne  (nJ^O,  m  =  1, . . . ,  M;ns  1, . . . ,  N;  j  =  mji,  sampling;  x  = 
normalized  data  vector  of  MN  =  J  elements,  and  Fj(xlS)  ^  j-fold  pdf  of  x,  the 
generalized  noise  given  the  signal  (vector)  S  a  [Sj],  etc.  The  associated  Bayes  risk 
(=  Co « LSME)  is  expressed  formally  as 


R(o.Y*)|p<i:QCF  =  C„|Y*-e|^  =Co 


‘»(Y*p<l|QCFf  +P|0-7*p<l 


l2"l 


QCF 


(1.2) 

Specifically,  the  general  result  7p=i|qj,p  for  estimating  the  9|^,m  =  l,...,M,out 
of  =  0  parameters  is  given  by; 


^p='Iqcp 


(1.3) 


2.  Optimum  UML  Estimators  (piHj)  <  I) 

Other  new  results  recently  obtained  by  the  author  concern  the  UMLE,  which  are 
derived  from  an  iqipropriate  "simple"  cost  function.  We  use  a  “strict"  form,  which 
now  yields  a  Bayes  risk^ 


Re«f.S*)|scf  =  Co{am  -  Jr[qFj(xlO)5(Y*p<,(x)- 0)dx 

+pjfi  Fj  (x|S(e))8(Y*<]  (X)  -  0)a(e)de]  j 


(2.1) 


Maximizing  the  integrand  of  Jp(  )dx,  minimizes  Re  (=  R^)  ;  e.g.,  the  extremal 
condiuon determining  Y^jC*)  ^{8{t-0)  +  P F(x|S(Y)a(Y))}^_^^.  =0. 

Using  the  fact  that  Fj(x|0)dV  =  Jp Fj(x|0)8(y*,,,  (x)  -o)dx  = 

where  q|j|(y|Ho)^  is  the  M  -fold  pdf  of  Yp=i  =y  =  |yi . yj^jjandFoisthe 

domain  of  x  for  which  Y*(x)p=l  =  0,  while  P  =  domain  of  all  x,  and  the 
requirement  that  Yp<j  tnust  be  unbiased,  gives,  after  some  manipulation,  the 
desired  (new)  result  for  the  optimum  estimator  here: 

Vl(»)|scF  =Yp=l<’‘W.  xeI']:Yp=i(x)’‘0 

=  -Yp=](x)scF”“/[l  +  Aj(x)]q^(olHo)  .  x6ro:Y*  ,(x)  =  0 

I  p<l 


(2.2) 


T^l  found  in  the  usual  way  [la]. 


ae 


{logo(e)-K(>0)(x|e)scF}^^^.  =0.  (2.3) 


where  k  log  ^Fj(x|S(0,0'))^g,  -  log  F](x|0). 

The  associated  Bayes  risk  is  obtained  here  by  inserting  (2.2)  back  into  (2.1)  and 
employing  integration  procedures  like  the  above.  For  p  =  1,  only  the  fust  term  of 
(2.2)  applies. 


3.  Concluding  Remarks 

As  noted  at  the  beginning,  failure  to  account  for  the  fact  that  p(Hi)  <  1  can  not 
only  lead  to  erroneous  (and  biased)  estimates,  but  also  these  can  be  sufficiently 
inaccurate  as  to  have  serious  consequences.  For  example,  a  differerxe  in,  say,  the 
mean  estimate  of  threshold  signal  amplitude  (power)  of -10%  vis-h-vis  the  correct 
(p<  1)  value,  corresponding  top  =  0.9  (vs.  p=  1)  produces  an  error  of  10%  in  the 
average  minimum  detectable  signal.  This  can  be  2.5  to  3.0  dB  for  -25  dB  or  -30 
dB  for  the  latter:  serious  amounts,  for  instance,  in  “Matched  Field  Processing", 
where  one  tries  to  keep  signal  degradation  below  1  dB  for  effective  matching  of  the 
propagation  model  to  the  received  data. 
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where  /(21)  _  log^Fj(x|S(0^,0  ))^g,  -  log(Fj(x|S))g,  with  0  =  (0,0  ),  0  =  [0|{|].  t  Equations  (3.15),  (3.16),  [la]  contain  unnecessary  integrals  over  V, 
cf.  (3.8),  [1].  5'"“  Fn,  w(Y).  6(y-V)  2  0. 
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Abstract 

The  problem  of  detecting  one  out-of  M  fading  signals  in 
a  Spherically  symmetric  noise  process  is  addressed.  We 
show  that  the  classical  detector  is  canonically  optimum 
regardless  the  fading  and  the  noise  models.  An  example  is 
worked  out  for  the  case  of  Middleton  Class-A  distributed 
noise  and  Nakagami  fading. 

Noise  and  fading  models 

A  compound-Gaussian  process  [1]  can  be  thought  of  as  the 
product  of  a  modulating  non-negative,  wide-sense  station¬ 
ary  process,  s{t)  say,  and  a  Gaussian,  possibly  complex, 
one,  g(t)  say,  independent  of  s(t),  namely  c{t)  —  s(t)g(t). 
Obviously,  not  all  processes  are  amenable  to  such  a  repre¬ 
sentation;  precisely,  the  admissibility  condition  the  com¬ 
mon  distribution  of  the  quadrature  components  of  the 
noise  should  fulfill  is 

fcl{x)  =  fcQ{x)  =  f  /■  ^  ,  e~^f(s)da  (1) 

Jo  V2Tcr^s* 

where  <7^  is  the  common  variance  of  the  quadrature  com¬ 
ponents  of  the  Gaussian  process  and  f{s)  is  the  first-order 
pdf  of  the  random  process  s(t).  Among  the  marginfd  pdf’s 
complying  with  (1)  we  cite  the  Middleton  Class-A  distri¬ 
bution,  the  Generalized  Gaussian,  the  Genereilized  Cauchy, 
the  Generalized  Lapletce  [2].  In  keeping  with  theoretical 
considerations,  supported  by  experimented  evidence,  we  as¬ 
sume  that  the  bandwidth  of  s(t)  is  much  smcdler  than  that 
of  g(t);  so,  on  sufficiently  short  time  intervals,  the  mod¬ 
ulating  process  is  practically  a  random  constant  and  the 
overall  noise  process  degenerates  into  a  Spherically  sym¬ 
metric  random  one.  When  such  a  model  is  in  force,  the 
spectral  properties  of  the  process  reproduce,  except  for  a 
scale  factor,  those  of  the  Gaussian  noise. 

As  to  the  channel,  we  assume  the  flat-flat  fading  model: 
the  useful  received  signals  are  hence  related  to  the  trans¬ 
mitted  waveforms  through  the  complex  factor  a  = 
with  A  -the  random  gain  of  the  channel-  arbitrarily  dis¬ 
tributed  and  4)  -the  received  phase-  uniformly  distributed 
in  [0, 2ir). 


Synthesis  of  the  optimum  detector 
The  Af-ary  detection  problem  can  be  stated  as 


hypothesis  that  maximizes  the  function 


1 

(2x5Wo)^^ 


/o 


(3) 


where  the  bar  denotes  expectation  with  respect  to  A  and  s, 
lo(-)  is  the  modified  Bessel  function  of  first  kind  and  order 
zero  and  2A/o  is  the  noise  power  spectral  density.  It  can  be 
shown  that  this  is  equivalent  to  maximizing  |r  ■  pj.  Thus, 
the  optimum  receiver  is  the  classical  ’ninimum-distance  de¬ 
tector,  regardless  the  first-order  distributions  of  the  noise 
and  of  the  fading.  As  to  the  correlated  case,  it  can  be  eas¬ 
ily  managed  by  introducing  a  linear  filter  which  whitens 
the  received  observations  [2]. 


Receiver  performance  in  Nakagami  fading 

The  performance  of  the  above  receiver  in  non-Gaussian 
noise  can  be  evaluated  by  simply  averaging  s  out  of  P(e|s), 
the  error  probability  under  Gaussian  noise. 

Consider  the  Nakagami  m-distribution,  namely 

where  Arms  is  the  channel  root  mean  square  gain,  and  m  is 
a  shape  parameter  ruling  the  fading  depth.  For  the  case  of 
M  orthogonal  signcils  embedded  in  noise  with  Middleton 
Class-A  pdf,  we  obtain 


00  M  —  1 


n‘)  =  E  E 

i=l  fc=l 


/  M-1^ 

sfm(k  +  1) 

fc  1 

yRk  -t-  s?m(fc  -f  1) 

m 


where  ^  ''i/’/i!  */  a  shape  parameter,  sf  =  (i/i/  -|- 

A)/(l  -f-  A)  A  the  ratio  of  the  power  of  the  Gaussian  com- 
ponenent  to  that  of  the  impulsive  one  and  -yR  denotes  the 
Signal- to- Noise  Ratio  (SNR).  Results  indicate  that  when 
deep  fading  is  present,  (e.g.  m=l)  the  shape  parameter 
of  the  noise  is  almost  uninfluential,  while,  for  increasing 
m,  spikier  noise  results  into  worse  performance:  however, 
the  detection  loss,  as  measured  with  respect  to  the  Gaus¬ 
sian  case,  approaches  a  constant  vedue  (depending  on  m) 
as  SNR  diverges. 
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Hi  =  r  =  -I- c  t=l,2, ...,M  (2) 

where  r,  Pj,  c  are  complex,  Af-dimensional  vectors  repre¬ 
senting  the  corresponding  waveform  signals  as  N  diverges. 
For  equally  likely  hypotheses  and  sign^s  with  equal  energy 
£'p,  minimi.'ing  the  error  probability  requires  choosing  the 
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SUMMARY 

We  consider  the  problem  of  detection  of  time-fre¬ 
quency  concentrated  transient  signals  in  white  Guassian 
noise.  There  are  many  instances  in  which  the  detection 
of  time-frequency  concentrated  transients  can  be  useful; 
any  case  in  which  the  class  of  transients  to  be  detected  is 
known  to  have  a  certain  time-frequency  signature,  but 
the  exact  time  samples  are  not  known.  Examples  of  such 
classes  are  speech  and  animal  sounds,  sonar  and  radar 
return  pulses,  seismic  signals,  and  underwater  acoustic 
transients. 

Recent  study  of  the  Weyl  correspondence  as  it  relates 
to  time-frequency  representations  [1,2]  has  been  fruitful 
in  that  a  way  to  associate  a  function  in  the  time-fre¬ 
quency  plane  (a.k.a,  "Phase  Space"  or  the  "Wigner 
plane")  with  a  linear  operator  has  been  suggested  and 
studied.  This  linear  operator  has  several  useful  proper¬ 
ties;  as  shown  in  [1],  it  is  a  self-adjoint  operator  provided 
the  function  in  the  Wigner  plane,  or  the  "symbol"  as  it  is 
called,  to  which  it  corresponds  is  real.  Also  a  real  sym¬ 
bol  can  be  reconstructed  by  taking  a  weighted  sum  of  the 
Wigner  distributions  of  the  eigenfunctions  of  its  corre¬ 
sponding  linear  operator  (with  the  eigenvalues  as  the 
weights). 

One  of  the  properties  of  the  Weyl  correspondence 
most  important  in  the  application  of  detection  of  time- 
frequency  concentrated  signals  is  the  fact  that  the  (dou¬ 
ble)  integral  of  the  symbol  multiplied  by  the  Wigner  dis¬ 
tribution  of  a  signal,  i.e.  their  inner  product,  is  equal  to 
the  inner  product  of  the  image  of  the  signal  under  the 
symbol's  corresponding  operator  and  the  signal  itself. 
That  is.  if  P(t,f)  is  the  symbol  (a  function  of  time  and  fre¬ 
quency),  and  is  the  (auto)  Wigner  distribution  of 

the  signal  x,  then  the  following  equality  holds; 

where  Lp  is  the  linear  operator  corresponding  to  the  sym¬ 
bol  P.  TTie  utility  of  this  property  is  this;  we  define  a  sig¬ 
nal  as  concentrated  in  a  region  if  the  integral  of  its 
Wigner  distribution  over  the  region  is  large  [3].  If  the 
symbol  is  a  "mask",  or  the  indicator  function  of  some 
region  R  in  the  time-frequency  plane,  then  the  Rayleigh 
quotient  (Lpx,x)/(x,x)  can  be  considered  the  degree  of 
concentration  of  x  in  the  region  R.  Hence  the  problem  of 
detecting  transient  signals  that  have  a  large  degree  of 
concentration  in  a  particular  time- frequency  region 
becomes  one  of  detecting  signals  for  which  the  Rayleigh 
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quotient  (Lpx,x)/(x,x)  is  larger  than  some  positive  thresh¬ 
old.  Such  a  class  of  signals  is  called  a  cone -class  and  our 
detection  problem  is  to  detect  signals  in  such  a  .set. 

Our  solution  to  this  problem  is  an  application  of  the 
generalized  likelihood  ratio  test  (GLRT)  assuming  the 
following  two  hypotheses; 


HQ.r  =  w 
Hyr  =  w  +  s 


where  w  is  a  zero  mean  Gaussian  noise  process  of  known 
variance  (with  covariance  matrix  cr/).  and  s  is  an 
unknown  signal  in  the  cone -class  defined  by 

(LpX.  X) 


E  p  tp) 


{x: 


-- 

(X.X) 


The  generalized  (log)  likelihtxtd  ratio  is 
A{r)  =  (r,r)-(r-  s,r-s) 

where  s  is  the  closest  approximation  to  r  in  l)  . 
Detector  performance  is  analyzed  for  a  class  of  rantlom 
concentrated  signals,  and  for  a  class  of  acoustic  well-log¬ 
ging  signals.  As  the  concentration  level  p  approaches  the 
largest  eigenvalue  A,] of  the  operator  Lp  the  problem 
approaches  subspace  detection  (4,  pp.  145-147]  in  the 
eigenspace  of  Xj . 

The  theory  is  applied  here  to  the  discrete-time,  dis¬ 
crete-frequency  ca.se.  where  signals  are  parameterized 
completely  by  a  finite-number  of  samples.  The  Wigner 
distribution  and  the  linear  operator  corresponding  to  an 
arbitrary  symbol  can  be  computed  without  approximation 
in  this  case;  properties  of  each  arc  verified. 
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Qnickosl  chango  dot  ret  ion  has  a  wido  vanoty  of  apfdiralions.  in¬ 
cluding  search  radar,  digital  signal  jirocessing,  image  proce.ssing.  mon¬ 
itoring  communication  channels,  and  fault  detection  [1.  2].  In  this 
paper  a  modified  Shiryayev  criterion  [9]  is  tised  to  study  the  problem 
of  (piicke.st  detection  of  an  abrupt  change  in  a  random  secpience  with 
independent  and  identical  distributions  before  and  after  the  change. 
The  modifietl  Shiryayev  criterion  minimizes  expected  delay 

=  A'|.V  -  -f.  ]|.V  >  m)  (1) 

subject  to  a  given  false  alarm  probal)ility 

o  =  /'{.V  <?/?)<  -v  (2) 

and  a  given  false  alarm  average  rutt  length  (AUL) 

,V/,„  =  A'(.V|.V  <  m)  >  r  (.•() 

where'  .V  is  raiulnm  stopping  variahli'.  ui  is  non-random  unknown 
rhange-tinio.  t  and  I  are  two  constants.  It  is  noteel  that  if  Kipi:)) 
is  ignored  and  tii  is  random,  llien  the  aitove  criterioti  is  identical  to 
Shiryayev's  ([9],  K<|s.(  1.1. ')())(  1.  |:tl ).  p.ltix).  'Mie  difference  between 
the  modified  .Sliiryayev  criterion  ami  tlie  criteria  used  in  [(i.  7}  is  that 
the  former  considers  the  fal.e  alarm  probability  while  the  latter  do 
not.  The  modified  Shiryayev  criterion  luus  important  applications  in 
a  nnmher  of  situations  where  a  change  occurs  in  a  finite  time.  It  lias 
been  sliown  [9]  tliat  a  proceiinre  of  (iirsliick  Unhin  (itj  and  Shiryayev 
[n.  9)  ((iKS  priicednrel  is  optimal  w  lien  the  unknown  cliange-time 
is  geometrically  distributed,  wliere  tlie  Sliiryayev  criterion  is  used. 
In  tiiis  paper  tiie  unknown  cliange-time  is  a.ssnmed  to  lie  any  non- 
random  integer  instead  of  a  geometrically  distributed  random  variable 
assumed  by  Sliiryayev  [9], 

.\  new  procedure  is  proposed  In  tliis  paper  to  approach  the  prob¬ 
lem  defined  above,  wliicli  minimizes  an  asymptotic  risk  |a  linear  com 
bination  of  D,,,.  o  and  .V/,„  )  and  is  referred  to  as  a  itiitiimiim  a.symp 
totic  risk  procedure.  Tlie  derision  statistic  (M.-\R  statistic) 

is 

■  ,  ^  n  "i  lC(j  -  (  1  -  S’,!  I 

'•<^1  )  =--  .  I - (■') 

'n'O  +  1  -  ttO 

where  j-^  =  (ri....x„)  is  otiservalions.  ir,,  and  r„  are  design  paraine- 
teis.  and  T„  is  tlie  well-know  ii  (  TST  M  statistic,  fhe  (TSINI  slatis 
tic  can  lie  written  as  =  nia.\{/„„|.  I }  where  /u  and  /|  are 

densities  liefori’  and  after  tiie  cliange  respectivelv.  Tlie  derision  rtiie 
of  tlie  M.AIi  proiednre  is  tliat  one  stops  and  derides  a  riiaiige  lias 
ornirred  as  soon  a.s 

r(  j-’,' )  ).  (."i) 

It  ha.s  born  sfK>wn  [I]  that  th«*  MAH  procedure  is  asympiotically 
optimal.  The  optimality  '\s  formally  expressed  as  the  following  fhe<v 
rem; 

Theorem  1  ll/jfu  (i)  Iht  out  -satnplf  Ith lifuttHl  ntlio  —  1  frtr 

dll  I.  or  (iij  till  falsi  alarrii  prolHthihhj  o  fpn  >  to  :(ro.  irr  han 


subject  to 


U'hcre  Dhm  and  D„i  denoti  the  erpected  delay  for  the  MAR  proce¬ 
dure  and  any  other  procedure,  fm  denede 

the  corresponding  false  alarm  ARLs.  and  Of,  and  o  the  faUe  alarm 
probabilities. 

For  nonasyrnptotic  situations,  simulation  results  reported  in  [4.  5] 
reveal  that  the  M.\R  procedure  compares  favorably  with  the  CT  Sl’M. 
GRS  and  moving  window  fixed  sample  size  procedures,  where  the 
modified  Shiryayev  criterion  wa>  used. 

We  have  also  ob.serxed  that  the  MAR  procedure  is  very  insensitive 
to  the  choice  of  design  parameter  co-  It  can  be  shown  [4]  that  Cq  = 
satisfies  Theorem  1. 
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Abstract 

N  separated  sensors  transmit  noisy  observations  of  a 
source  to  a  central  processor  for  final  estimation;  local  en¬ 
coding  of  each  observation  is  performed  prior  to  transmis¬ 
sion.  The  issue  is  to  devise  encoding-decoding  schemes  for 
this  decentrcdized  context.  The  present  scheme  is  com¬ 
posed  of  N  scalar  quantizers,  each  with  a  corresponding 
decoder,  and  a  linear  combination  of  individual  estimates. 
This  structure  ensures  that  the  decoder  size  increases  only 
linearly  with  N.  Through  an  example  involving  2  sen¬ 
sors  with  a  Gaussian  model  of  source  and  observations, 
this  scheme  is  compared  to  a  previously  considered  uncon¬ 
strained  decoding  scheme  and  to  a  distortion-rate  bound. 

Problem  statement 

With  reference  to  Figure  1,  X  is  a  random  variable  rep*- 
resenting  the  source  and  Vi, . .  .V/v  are  noisy,  possibly  cor¬ 
related,  observations  taken  at  each  of  N  separated  sen¬ 
sors.  Due  to  communications  constraints,  Y,  is  com¬ 
pressed  by  a  sccdar  quantizer  Q,  of  rate  Ri  producing  index 
Z,,  i  =  Quantizers  cannot  share  their  data. 


Figure  1:  Schematic  of  the  encoding-decoding  system. 

hence  they  do  not  cooperate.  The  indices  are  transmitted 
to  a  central  processor.  Reproduction  of  the  remote  source 
is  accomplished  by 

1.  A  bank  of  N  decoders  g,,  each  producing  an  estimate 
X,  of  the  remote  source  based  on  Z,. 

2.  Linear  combination  of  individual  estimates  to  yield  the 
fin2d  estimate  X  =  X_  where  X  =  (Xi, . .  .X^^)^. 

This  structure  ensures  that  the  size  of  the  decoding  table  is 
in  the  order  of  2^'  -t-  •••  2^*^,  hence  increases  linearly  with 
N;  in  contrast,  if  no  such  structure  were  imposed  (as  in 
[1]),  the  size  would  be  2^'  x  •  •  •2^",  hence  would  increase 
exponentially  with  N.  Therefore,  the  problem  is: 

given:  the  source  density  p(x),  the  observation  model 
p(y|z)  and  a  pointwise  distortion  measure  d(x,x), 

find:  N  pairs  (q,,g,)  of  encoding-decoding  rules,  and  a 
weighting  vector  v, 


such  that:  The  average  distortion  D  =  Ed{X,X)  is  as 
small  as  possible. 

Decoupled  solution 

A  simple  suboptimum  approach  is  to  isolate  the  design  of 
the  {qi,  g,)  pairs,  i  =  1, . . .,  A,  from  the  selection  of  y.  De¬ 
sign  of  the  pair  (yt,yt)  can  be  accomplished  as  in  unifilar 
encoding-decoding  of  a  remote  source,  namely  cdternat- 
ingly  improving  q,  into  q^  and  y,  into  g^  according  to 

=  ar^min  j ^d(x,x  -  g(z,))p{x,y,)dx  (1) 

g^iz,)  -  argmin  /  d(x,  x,)p{x\z,  =  q,{y,))dx  (2) 

X,  Jx 

Upon  reception  of  the  collection  Z,,  i  =  1, . . . ,  N ,  a  bank 
of  N  decoders,  one  for  each  encoder,  produces  centrally 
the  vector  of  the  individual  optimal  estimates  X,,  i  = 
1,2, . . . ,  N ,  according  to  (2).  The  optimum  weighting  vec¬ 
tor  can  be  found  as  that  vector  which  minimizes  the  dis¬ 
tortion  D  =  Exz(i{X,y'^X(Z)}. 

Example 

To  assess  the  proposed  scheme,  also  in  comparison  with 
unconstrained  scheme  [1],  we  adopt  the  quadratic  distor¬ 
tion  measure  d(x,  x)  =  (x  —  x)"  and  we  consider  the  case  of 
a  Gaussian  source,  X  ~  A/’(0, 1),  and  2  sensors  affected  by 
additive,  zero-mean  Gaussian  noise  with  common  variance 

<7"  and  correlation  coefficient  p.  The  optimum  weighting 
--  T 

vector  is  then  v  =  £^[XX  ]E~',  where  E  is  the  covariance 
matrix  of  X. 

Results  are  presented  in  Table  1  which  shows  the  perfor¬ 
mance  of  the  present  scheme  (labelled  as  I),  the  perfor¬ 
mance  of  the  unconstrained  scheme  [l](labelled  as  II)  and 
a  distortion  bound  for  decentralized  schemes  [1].  Notice 
that  the  loss  of  the  present  scheme  becomes  significant  only 
for  highly  adverse  correlation  and  for  high  variance  ratio 
7  = 

Table  I 

Distortion  results  {Ri  =  R2  —  3). 


n 

=  0  dB 

7 

=  20  dB 

p 

I 

II 

OPTA 

I 

II 

OPTA 

-0.99 

0.039 

0.039 

0.005 

0.029 

0.010 

0.002 

0 

0.349 

0.348 

0.051 

0.036 

0.015 

0.006 

0.99 

0.506 

0.502 

0.091 

0.046 

0.018 

0.010 
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Introduction 

The  estimation  of  motion  and  boundaries  of  objects  in  an 
image  sequence  is  an  important  issue  for  efficient  video 
compression.  It  allows  exploitation  of  the  strong  statis¬ 
tical  bindings  of  image  intensities  along  the  motion  tra¬ 
jectories  [1,  4]. 

The  idea  of  object-oriented  motion  estimation  is  to  sub¬ 
divide  the  scene  into  regions  of  continuous  motion.  Thus 
discontinuities  in  the  motion  field  may  only  occur  at 
region  boundaries.  Ideally,  each  region  uniquely  corre¬ 
sponds  to  one  surface  of  a  moving  object  in  the  3-D  real 
world. 

The  motion  field  is  regarded  as  a  pair  of  random  fields 
V  =  (U,  L),  where  U  denotes  a  field  of  one  motion  vector 
per  pixel,  and  L  denotes  a  generic  segmentation  field. 
The  segmentation  field  groups  the  motion  vectors  into 
several  continuous  regions. 

Image  Model 

This  contribution  follows  a  model  based  Bayesian  ap¬ 
proach.  The  model  considers  MSE  of  motion  compen¬ 
sated  prediction,  motion  discontinuities  and  uncovered 
regions.  The  resulting  estimation  criterion  is  derived 
straight  forward  as  MAP-criterion  with  help  of  the  model 
assumptions. 

Given  the  samples  a  (=  previous  frame)  and  6  (=  next 
frame)  of  the  random  fields  A  and  B,  the  objective  is 
to  find  the  motion  field  sample  v‘  =  (u*,/*)  of  V  of 
maximum  a  posteriori  probability 

P(V  =  v*\A  =  a,B  =  b)  =  mux{PiV  =  v\A=a,B=b)} 

P{B  =  b\V  =  n‘,>l  =  a).p[V  =  u‘|>l  =  g) 

P{B  =  b\A  =  a)  ’  ' 

where  the  reformulation  of  the  objective  function  follows 
from  Bayes  rule. 

The  first  factor  of  the  numerator  is  described  by  a  so 
called  observation  model.  In  this  contribution  it  is  as¬ 
sumed  to  depend  on  the  displaced  frame  difference  (dfd) 
only.  The  dfd  is  modeled  segmentwise  stationary  obey¬ 
ing  a  white,  zero-mean  generalized  gaussian  distribution 
in  each  segment. 

In  each  segment  not  corresponding  to  decovered  back¬ 
ground  ML-estimates  for  tr  and  i/  are  substituted  back 
into  the  objective  function.  In  decovered  regions  a  likeli¬ 
hood  that  is  slightly  lower  than  the  one  of  the  least  likely 
region  is  imposed.  This  assures  that  motion  is  estimated, 
whereever  a  reasonable  correspondence  between  regions 
of  successive  frames  can  be  established. 


The  second  factor  of  the  numerator  in  (1)  mainly  ac¬ 
counts  for  the  spatio-temporal  statistical  bindings  within 
a  motion  field  and  is  described  by  a  motion  model  repre¬ 
senting  prior  expectations  on  V.  The  principle  of  mini¬ 
mum  description  length  [3]  is  adopted,  which  assigns  each 
sample  v  a  probability  according  to  its  content  of  decision 

P{V  =  v\A  =  a)  =  P(V^  =  v)  = 

where  C{v)  denotes  the  code  length  of  a  (lossless)  con¬ 
tour/texture  code.  The  contour  code  describes  the  seg¬ 
mentation  matrix  I  while  the  texture  code  segmentwise 
describes  the  motion  vectors  u  assuming  strong  statisti¬ 
cal  bindings  of  neighboring  motion  vectors  belonging  to 
the  same  object  (label).  It  can  be  shown  that  is  a 
Gibbs/Markov  random  field  (cf.  [.5]). 

Combining  the  two  models,  according  to  (1),  the  MAP- 
criterion  can  be  derived  as 


-  log  (P(F  =  t;|A  =  a,  S  =  !.))  = 


-}-log(2)  Qn)  +  const.,  (2) 

where  denotes  the  number  of  pixels  in  the  fc-th  region 

Gt  and  Xi  denotes  a  pixel  of  the  dfd.  (2)  is  locally  mini¬ 
mized  employing  iterated  conditional  modes  [2]  providing 

fast  convergence. 
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Abstract  —  In  this  short  paper,  we  describe  how  multi-grid  methods, 
originally  developed  for  numerical  solution  of  differentia)  equations,  can 
be  used  in  the  mean  field  calculations  for  Markov  random  fields  in  EM 
procedures  to  reduce  computation  and  improve  convergence. 

1.  Introduction: 

Many  problems  in  image  processing  and  computer  vision  can  be  for¬ 
mulated  as  incomplete  data  problems  [1],  where  part  of  the  data  is  not  ob¬ 
servable  (hidden)  and  needs  to  be  estimated  along  with  the  data  model  (in 
peu^ticular,  model  parameters).  The  Markov  random  field  (MRF)  has  been 
demonstrated  as  a  very  general  and  effective  model  for  the  hidden  variables 
since  it  captures  the  underlying  physical  processes  and  constraints  (e.g., 
[2])- 

The  EM  (expectation-maximization)  algorithm  [1]  is  an  effective 
maximum-likelihood  (ML)  procedure  for  parameter  and  hidden  variable 
estimation  in  incomplete  data  problems.  However,  when  the  hidden  vari¬ 
ables  are  modeled  as  MRF’s,  it  runs  into  difficulty  due  to  the  exponential 
complexity  in  the  calculation  of  the  conditional  mean  of  the  MRf  required 
by  the  E-step.  To  overcome  this  difficulty,  we  have  developed  an  iter¬ 
ative  procedure  [3]  based  on  the  mean  field  theory  (MFT)  of  statistical 
mechanics.  This  approach  provides  a  mathematically  sound  (in  some  sense 
optimal)  approximation  that  can  be  calculated  (without  the  exponential 
complexity). 

While  the  efficacy  of  the  MFT  approach  has  been  demonstrated  in  our 
previous  work  in  image  segmentation  and  image  restoration,  it  is  prone  to  a 
problem  common  to  many  iterative  procedures  —  as  the  data  size  increases, 
the  amount  of  compulation  increases  drastically  and  the  convergence  slows 
down.  In  fact,  this  problem  has  plagued  numerical  methods  for  differential 
equations  for  quite  some  time  until  recently  when  a  solution,  known  as  the 
multi-grid  (MG)  method,  has  been  developed.  In  this  paper,  we  summarize 
our  work  in  applying  MG  methods  to  the  mean  field  calculations  in  EM 
procedures,  while  more  details  can  be  found  in  [4). 

2.  MFT  in  EM  Procedures: 

Let  S  be  a  2-D  lattice  with  a  neighborhood  system  and  u  =  {uj,  i  €  S) 
be  an  MRF.  Then  it  is  well  known  that  u  has  a  Gibbs  distribution: 


p(u)  =  Z"^exp  -  0U{u)  =^“*e>p  -^^V'c(u] 


(la) 


where  {/(u)  is  the  energy  function,  I4(u)’s  are  the  clique  potentials,  and  Z 
is  the  partition  function. 

It  is  not  difficult  to  see  that  the  direct  calculation  of  the  mean  of  u. 

<  u  >  is  exponentially  complex  since  one  needs  to  sum  over  all  possible 
configurations  of  u.  The  MFT  suggests  an  approximation  [3]:  the  influence 

of  Uj,  j  ^  I,  in  the  calculation  of  <  u,  >  can  be  approximated  by  that  of 

<  Uj  >  i.e.. 


<  u,  >~  z,'"-''  exp[-;jr,'"^  (u,)],  (2a) 

where 


ur>‘  =  >)  (2b) 

j€N, 

and  is  a  normalization  factor.  In  an  EM  procedure  where  the  hidden 
variables  are  modeled  as  MRF,  the  mean  field  theory  of  (2a)'(2b)  can  be 
used  in  the  E-step  to  evaluate  the  Q  function: 


logp(y(u,<t)  +  iogp(u(<f)|y.4»‘'’’  >.  (3) 

where  y  is  the  observed  data.  <1>  is  the  model  parameter  vector,  p  represents 
the  pth  iteration. 

3.  Application  of  MG  Methods. 

For  the  sake  of  simplicity,  we  illustrate  the  MG  ideas  through  a  two- 
grid  method.  Let  S  be  the  fine  grid,  now  denoted  by  S^;  one  can  then 
generate  a  coarse  gridS^  by  merging  neighboring  sites  in  S'*  (e  g,,  merging 
every  four  neighboring  sites  into  one  site)  The  two-grid  method  achieves 
computation  reduction  and  convergence  acceleration  by  alternating  mean 
field  calculations  between  the  fine  and  coarse  grids  rather  than  just  on  the 


fine  grid  (e.g.,  fine  first,  then  coarse,  then  fine).  Two  important  problems 
in  this  method  are;  how  to  transfer  between  u'*  and  and  how  to  de¬ 
fine  the  coarse-grid  energy  function  C^^(U^).  The  solution  to  the  first 
problem  is  easy  -  through  an  interpolator  and  a  restriction  operator,  i.e., 
— »  u*  and  ;  u'*  — »  u^,  respectively.  The  second  problem  is.  on 
the  other  hand,  more  difficult.  We  have  experimented  with  two  strategies: 
the  “fractaP  method  (energy  function  the  same  in  different  grids,  similar 
to  [5])  and  Galerkin’s  method  ~  U^(I^u^)).  Typical  results  by 

Galerkin’s  method  are  shown  in  Fig.  1  for  image  segmentation.  To  achieve 
the  same  segmentation  quality  (MSE  of  classification),  the  MG  scheme  uses 
only  2/3  of  the  time  used  by  a  fine-grid  only  scheme  (the  saving  is  much 
large  for  a  larger  image,  e  g..  256  x  256). 

4.  Summary; 

In  this  paper,  we  have  described  the  application  of  MG  methods  in 
mean  field  calculations  in  EM  procedures  where  the  hidden  variables  are 
modeled  as  MRF’s.  This  approach  achieves  computation  reduction  and 
acceleration  of  convergence  through  alternating  the  calculation  of  the  mean 
field  among  different  grids  (fine-coarse-fine).  Both  fractal  and  Galerkin 
coarsening  provide  good  results. 
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Figure  1:  An  Example  in  Image  Segmentation. 

The  first  row  shows  the  true  “region  map"  (what  the  segmentation 
should  be  like)  and  the  observed  image.  The  second  row  shows  the  results 
of  three  iterations  on  the  fine-grid,  a  coarse-grid,  and  the  fine-grid,  respec¬ 
tively  The  convergence  is  illustrated  by  the  MSE  (between  consecutive 
segmentations  in  the  iterations).  Galerkin's  method  is  used  for  coarsening 
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The  paper  presents  a  solution  to  the  general  problem  of 
fitting  a  parametric  model  to  observations  from  a  sin¬ 
gle  realization  of  a  2-D  homogeneous  random  field  with 
mixed  spectral  distribution.  So  far,  the  problem  has 
been  largely  unsolved.  Existing  methods  either  assume 
the  field  has  an  absolutely  continuous  spectral  distribu¬ 
tion  and  try  to  fit  white  noise  driven  linear  models  to 
the  observed  field,  or  treat  the  special  case  of  estimat¬ 
ing  the  parameters  of  a  sinusoidal  signal  in  white  noise. 
The  existence  of  evanescent  random  fields  hcus  not  re¬ 
ceived  attention  in  the  estimation  literature,  although 
the  evanescent  components  have  major  impact  on  the 
structure  and  properties  of  the  random  field,  as  they 
result  in  directional  attributes  in  the  observed  realiza¬ 
tions.  We  present  a  maximum-likelihood  solution  to  this 
estimation  problem. 

On  the  basis  of  a  2-D  Wold-like  decomposition  [1],  the 
random  field  is  decomposed  into  a  sum  of  two  mutu¬ 
ally  orthogonal  components:  a  purely-indelenninistic 
field  and  a  deterministic  one.  The  2-D  deterministic 
random  field  is  further  orthogonally  decomposed  into  a 
half-plane  deterministic  field  and  a  generalized  evanes¬ 
cent  field.  The  generalized  evanescent  field  is  a  linear 
combination  of  a  countable  number  of  mutually  orthog¬ 
onal  evanescent  fields.  A  typical  example  of  an  evanes¬ 
cent  field  is  a  2-D  separable  random  field  which  is  the 
product  of  a  1-D  purely-indeterministic  random  process 
along  one  axis  and  a  harmonic  1-D  process  in  the  or¬ 
thogonal  dimension.  The  above  decomposition  implies 
a  similar  decomposition  of  the  spectral  measure  of  the 
regular  random  field  into  a  sum  of  mutually  singular 
spectral  measures,  each  associated  with  a  different  com¬ 
ponent  of  the  spatial  decomposition.  The  spectral  dis¬ 
tribution  function  of  the  purely  indeterministic  field  is 
the  absolutely  continuous  component  of  the  regular  field 
spectral  distribution.  The  spectral  measure  of  the  de¬ 
terministic  field  is  concentrated  on  a  set  of  Lebesgue 
measure  zero  in  the  2-D  frecpiency  plane.  For  practical 
applications,  these  results  suggest  that  the  deterministic 

‘This  work  wa.s  partially  supported  by  NSF  grant  MIP- 
9120.377. 

^Currently  with  the  I.B.M.  T.  J.  Watson  Research  Center.  NY 
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field  “spectral  density  function”  has  the  form  of  a  sum 
of  1-D  and  2-D  Dirac  delta  functions.  The  harmonic 
random  field,  which  generates  the  2-D  delta  functions  of 
the  “spectral  density”  is  one  of  the  components  of  the 
half-plane  deterministic  random  field.  The  1-D  delta 
functions  which  are  supported  on  lines  of  rational  slope 
result  from  the  generalized  evanescent  random  field. 

Hence,  in  this  paper  we  concentrate  on  a  solution  to  the 
problem  of  estimating  the  parameters  of  the  harmonic 
and  generalized  evanescent  components  of  the  field  in 
the  presence  of  an  unknown  colored  noise  generated  by 
the  purely-indeterministic  component,  jointly  with  es¬ 
timating  the  purely-indeterministic  component  param¬ 
eters.  We  assume  that  the  purely-indeterministic  com¬ 
ponent  can  be  modeled  by  a  2-D  AR  model. 

The  suggested  algorithm  is  a  two-stage  procedure.  In 
the  first  stage  we  obtain  a  suboptimal  initial  estimate 
for  the  parameters  of  the  spectral  support  of  the  evanes¬ 
cent  and  harmonic  components  by  solving  the  set  of  2-D 
overdetermined  normal  equations  for  the  parameters  of 
a  high-order  linear  predictor  of  the  observed  data.  In 
the  second  stage  we  refine  these  initial  estimates  by  it¬ 
erative  maximization  of  the  conditional  likelihood  of  the 
observed  data,  which  is  expressed  as  a  function  of  only 
the  parameters  of  the  spectral  supports  of  the  evanes¬ 
cent  and  harmonic  components.  This  representation  is 
possible  due  to  a  parameter  transformation  developed 
in  this  work.  The  solution  for  the  unknown  spectral 
supports  of  the  harmonic  and  evanescent  components 
reduces  the  solution  for  the  other  unknown  parameters 
of  the  field,  to  a  linear  least  squares  solution.  Exper¬ 
imental  evidence  is  presented  to  demonstrate  the  high 
accuracy  of  the  estimates  for  each  of  the  random  field 
components:  harmonic,  evanescent  of  any  orientation, 
and  purely  indeterministic. 
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SUMMARY 


Wozencraft  Kennedy  [1]  and  Massey  [2]  have  argued 
that  the  criterion  is  the  logical  choice  for  the  design  of  mod¬ 
ulation  systems  for  coded  digital  communications.  This  paper 
develops  a  relationship  between  the  change  in  the  Ro  parame¬ 
ter  for  quantized  receivers  and  the  asymptotic  relative  efficiency 
(ARE)  of  quantized  detectors,  which  sheds  light  on  both  receiver 
design  and  performance  analysis. 

Considering  only  binary  input  communication  systems  with 
Q  output  symbols  and  assuming  that  the  transmission  of  data 
is  corrupted  by  zero  mean,  additive  white  Gaussian  noise,  the 
quantized  receiver  has 


=  1  -  logs 


1  +  ^ 


*=1 


(1) 


where  q,,  =  ^  and  ^ 

g-(xH- Here,  the  i*  specify  the  end  points  of  the  quan¬ 
tization  interval  A*,  and  Es  is  defined  as  the  signal  energy  per 
dimension.  The  expression  for  the  cutoff  rate  of  a  DMC  when 
employing  an  optimum  receiver  with  no  quantization  (Q  =  oo) 
is  given  by  [3] 

R„  =  l-log2(l  +  e'-^'^'''^»),  (2) 

where  ^/o/2  =  (T^  is  the  noise  power  spectral  density  of  the 
additive  white  Gaussian  noise  (AWGN)  chjinnel. 

In  order  to  determine  the  amount  of  degradation  due  to 
receiver  quantization,  we  must  first  determine  the  {z'a}  that 
maximize  (1).  One  finds  that  the  necessary  condition  for  the 
{ifc}  to  be  optimal  is  that  they  satisfy  the  condition  [2]  x/i  = 
{ln[A(6/,)A(6A+i  )]^)/2v/riv,  where  lo  =  — oo  and  xg  =  oo  and 
A(6/,)=P(6/,|yE^)/P(6*|-  y/Es)  =  qiH/q2h-  Here  the  6).  corre¬ 
sponds  to  the  output  level  associated  with  the  A),  interval.  An 
iterative  numerical  technique  can  be  used  to  solve  for  the  set  of 
{xh}- 

Having  obtained  the  threshold  levels  {x^}  that  maximize 
R'g,  we  may  now  determine  the  amount  of  degradation  intro¬ 
duced  by  certain  quantization  schemes  by  comparing  the  differ¬ 
ence  between  Ro  and  RJ,. 

Next  we  consider  the  optimum  qutintization  of  data  where 
the  quantized  data  are  to  be  used  to  form  a  test  of  hypothesis  for 
signal  detection.  In  particular,  the  optimum  quantizer  is  defined 
as  the  one  that  maximizes  detection  efficacy.  The  problem  that 
we  consider  here  is  the  detection  of  a  constant  positive  signal  s 
in  additive  noise  with  a  symmetric  density  function  /(.r).  For 
testing  the  hypothesis  H  :  f  consists  of  noise  only,  versus  the 
alternative  K  :  f  consists  of  signal  and  noise,  where  f  is  the  re¬ 
ceived  vector,  the  generalized  test  statistic,  based  on  quantized 
data,  may  be  described  by  [4]  S  =  where  the 

are  the  quantized  data  and  the  is  the  known  signal  se¬ 

quence  representing  the  dc  signal.  The  efficacy  £,  for  this  test 


statistic  for  the  constant-signal  case  (i.e.  =  s  =  constant  for 

all  i)  may  be  written  as  [4] 


2{E  ';[/(«;) -/K-.)]}' 

j=i _ 

Q/2 

>=] 


(3) 


Here  the  Oj  specify  the  end  points  of  the  Q  input  ranges  and  the 
Ij  correspond  to  the  output  levels  of  each  input  range.  The  vec¬ 
tors  d  and  J  that  maximize  ( 3)  when  /( x)  is  the  normalized  Gaus¬ 
sian  density  function  are  given  bv  I,  =  f“'''  xf{x)dxl  f°''' 
/(x)dx  andflj  =(/;+, -t-/j)/2  [4]. 

In  order  to  evaluate  the  performance  of  the  quantized  detec¬ 
tors,  we  compare  them  to  the  linear  detector  since  this  is  the  op¬ 
timum  detector  when  detecting  a  dc  signal  in  Gaussian  noise  [5]. 
The  test  statistic  of  the  linear  detector  is  given  by  St  =  Ei=) 
and  its  corresponding  efficacy  is  £sl  ~  where  is  the 

variance  of  the  noise  density. 

We  obtain  the  ARE’s  of  the  quantized  detectors  relative  to 
the  linear  detector  by  taking  the  ratio  of  their  efficacies. 


^S.Si 


£s 

es. 


HI*  ^ 

j=i 


Q/2 


(4) 


Taking  the  background  noise  density  to  be  the  normalized  Gaus¬ 
sian  density  function,  we  obtain  the  desired  results. 

It  is  shown  that  in  the  vanishing  signal/large  sample  size 
case,  the  two  efficiency  measures  give  the  same  results  for  all 
values  of  Q. 
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Abstract 

In  this  paper,  a  new  procedure  for  decoding  cyclic  and  BCH 
codes  up  to  their  actual  minimum  distance  is  presented.  Previous 
algebraic  decoding  procedures  for  cyclic  and  BCH  codes  such  as 
the  Peterson  decoding  procedure  and  our  procedure  using  nonre- 
cunent  syndrome  dependence  relations  can  be  regarded  as  special 
cases  of  this  new  decoding  procedure.  With  the  aid  of  a  computer 
program,  it  has  been  verified  that,  using  this  new  decoding  pro¬ 
cedure,  all  binai7  cyclic  and  BCH  codes  of  length  6.3  or  less  can  be 
decoded  up  to  their  actual  minimum  distance.  The  ptXK'edure 
incorporates  an  extension  of  our  Fundamental  Iterative  Algorithm 
and  the  complexity  of  this  decoding  prtKedure  is  O(n^). 

Summary 

For  some  years,  algebiaic  decoding  of  cyclic  ;ind  BCH  codes 
has  been  restricted  by  the  minimum  distance  bounds  of  the  codes. 
Previous  algebraic  decoding  algorithms  (Beiiekamp- Massey. 
Euclidean,  and  our  generalizations  )  have  aimed  at  solving 
Newton's  identities  which  can  be  viewed  as  a  set  or  sets  of  linear 
recurrences.  We  have  recently  introduced  a  procedure  that  frees  the 
decoding  of  cyclic  and  BCH  codes  from  the  confinement  of  the 
bounds  and  can  decode  many  cyclic  and  BCH  codes  up  to  their 
actual  minimum  distance  [1].  In  our  recent  procedure,  the  decoding 
is  accomplished  through  the  determination  of  nonrecunent  depen¬ 
dence  relations  among  the  syndromes.  However,  the  application  of 
this  prtKedure  depends  on  a  condition  that  has  to  be  satisfied  for  a 
code  to  be  so  decoded.  Thus,  th;it  decoding  procedure  is  still  short 
of  the  desired  final  goal  on  achieving  decoding  of  all  cyclic  and 
BCH  codes  up  to  their  actual  minimum  distance.  In  this  paper,  we 
present  a  new  decoding  procedure  that  does  not  depend  on  the 
satisfaction  of  this  condition.  Wc  show  that,  for  a  code  with  actual 
minimum  distance  d  to  coirect  up  to  t  =  Ltrf-  I  )/2j  cnors.  all 
that  is  required  is  that  a  (2t+l  )x(2t-(-l )  syndrome  matrix  can  be  so 
formed  that  the  syndromes  above  the  minor  diagonal  are  all  known 
and  those  at  the  minor  diagonal  are  some  unknowns  and  their  con¬ 
jugates.  With  reference  to  the  table  of  codes  listed  in  van  Lint  and 
Wilson  s  paper  [2]  and  with  the  tiid  of  a  computer  program,  the 
existence  of  at  least  one  such  matrix  for  each  code  has  been 
verified  for  all  binary  codes  of  length  63  or  less.  Thus,  to  say  the 
least,  the  procedure  is  capable  of  decoding  all  binaiy  cyclic  and 
BCH  codes  of  length  <  63  up  to  their  actual  minimum  distance. 
We  have  also  demonstrated  the  existence  of  such  syndrome 
matrices  for  some  codes  of  length  greater  than  63.  The  prtKedure 
is  a  very  general  one  and  includes  previously  mentioned  algebraic 

decoding  procedures  as  special  ciises.  It  can  be  iipplied  to  the 
decoding  of  codes  of  any  length  for  which  such  syndrome  patterns 
exist. 

Till',  wirk  was  siipimncd  by  the  National  Sciciiti'  Foiiiulalioii  umicr  Oram 
NCR-9(II609.S. 


More  specifically,  the  syndrome  matrix  S  refeired  to  in  this 
paper  is  of  the  following  form: 

r  I 


Sh 

5hti, 

■■ 

1 

+J, 

5i,  n,+j. 

"  S^+;,^ 

,-fyi 

i 

+7j 

5hti,  *j. 

5h  +  i.*J^ 

+  1,  +;.,  , 

where  the  triangular  portion  of  S  above  the  minor  diagonal  consists 
of  known  syndromes  and  the  syndromes  at  the  minor  diagonal  of  S 
are  .some  unknowns  and  their  conjugates. 

Under  the  assumption  that  v  enors  actually  occuned  where  v 
<  I,  then  there  exist  ;it  most  V  columns  of  S  which  are  linearly 
independent.  The  other  columns  are  then  dependent  on  these 
columns.  A  major  step  for  this  decoding  procedure  is  then  to  deter¬ 
mine  the  unknown  syndromes  through  the  linear  dependence  rela¬ 
tions  among  the  columns  of  S.  In  this  paper,  we  show  that  this  can 
be  accomplished  through  an  extention  of  the  Fundamental  Iterative 
Algorithm  we  first  introduced  in  [3]. 

Once  So.  S|.  Sy.  S„_ ,  tire  computed,  the  enor  vector  can  be 
determined  through  an  inverse  Fourier  transform  of  the  syndrome 
vector  I  So,.S|,  Sy.  •  ••.  S„.  ]  ). 

We  note  thtit  the  decoding  of  the  (41.21.9)  <|utidratic  residue 
code  [4]  can  be  much  more  easily  handled  by  this  new  procedure. 
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Abstract 

Conventional  decoding  techniques  for  decoding  cyclic  codes  re¬ 
quire  the  computation  of  power  sum  syndromes  which  can  often 
account  for  a  significant  portion  of  the  decoder  computations. 
Since  the  syndromes  can  be  computed  from  the  remainder  poly¬ 
nomial,  the  polynomial  obtained  by  dividing  the  received  polyno¬ 
mial  by  the  code  generator  polynomial,  it  follows  that  this  poly¬ 
nomial  contains  all  the  information  required  to  decode.  Thus 
one  might  hope  for  a  decoding  technique  that  uses  the  remain¬ 
der  polynomial  directly.  Berlekamp  and  Welch  have  given  such 
an  algorithm  which  requires  the  sequential  testing  of  the  par¬ 
ity  check  locations  and  updating  of  four  polynomials.  Whiting 
in  his  doctoral  thesis  has  given  a  modification  of  this  procedure 
that  makes  more  efficient  the  evaluation  and  updating  of  these 
polynomials. 

The  present  work  derives  a  new  algorithm  using  only  the  remain¬ 
der  polynomial.  A  new  key  equation  is  derived  which  may  be 
solved  by  the  usual  Euclidean  algorithm.  The  advantages  of  this 
approach  are  discussed  and  compared  to  the  original  algorithm 
and  a  performance  of  the  algorithm  in  terms  of  computational 
and  circuit  complexity  is  considered. 

Summary 

Conventional  decoding  techniques  for  cyclic  codes  require  the 
computation  of  the  power  sum  syndromes.  These  are  given  by 
the  evaluation  of  the  remainder  polynomial,  the  polynomial  ob¬ 
tained  by  dividing  the  received  word  by  the  code  generator  poly¬ 
nomial,  at  the  roots  of  the  generator  polynomial.  Such  computa¬ 
tions  can  absorb  a  significant  part  of  the  decoding  effort.  As  the 
remainder  polynomial  contains  all  the  information  required  to 
decode,  it  might  be  hoped  to  derive  a  technique  that  uses  the  re¬ 
mainder  polynomial  directly.  Berlekamp  and  Welch  [ll,[2]  devise 
such  an  tJgorithm  which  requires  a  sequential  test  of  the  parity 
check  locations,  the  evaluation  of  discrepancies  at  these  locations 
and  the  updating  of  various  estimates.  The  procedure  involves 
four  polynomials  and  there  appears  to  be  no  obvious  way  to  split 
the  algorithm  into  the  more  conventional  determination  of  the  er¬ 
ror  locator  and  evaluator  polynomials  via  either  the  Berlekamp- 
Massey  or  Euclidean  algorithm  approach,  followed  by  a  Chien 
search.  The  Berlekamp- Welch  algorithm  was  further  considered 
in  the  thesis  by  Whiting  [3],  which  also  contains  an  excellent 
description  of  the  algorithm  itself.  (As  fat  as  the  authors  are 
aware,  the  algorithm  itself  has  not  appeared  in  the  literature.) 
He  devised  an  efficient  linear  scaling  technique  for  the  updating 
of  the  polynomials. 


The  present  work  derives  a  new  remainder  based  algorithm.  It 
develops  a  key  equation  that  uses  the  remainder  polynomial, 
expressed  in  terms  of  the  Lagrange  interpolation  polynomials, 
which  allows  solution  for  the  error  locator  and  evaluator  poly¬ 
nomials  by  the  conventional  Euclidean  algorithm  technique.  In 
particular  it  shows  that  if  the  remainder  polynomial  of  the  Reed- 
Solomon  code  with  parameters  (n,k,d  =  n  —  k  +  1  =  2t  -I-  1), 
then 

F(x)  =  'fr,hi(x),  hs(x)  =  cs  n  (I  -  -  a’) 

k=0  i=0,i,tk 

where  cs  depends  on  the  code  parameters  only  and  hk{x)  is  a 
Lagrange  polynomial,  then  if  there  exist  polynomials 

A(x),  W(x),  degiV(i)  <  degW(x)  <  t 

which  satisfy  the  key  equation 

fV(x)F(x)  =  W(x)mod  p(x)  =  (*  —  l)(x  -  a)  •  •  •  (x  —  q’*"*) 

then  W(x)  is  the  error  locator  and  N(x)  is  closely  related  to 
the  usual  error  evaluator  polynomial.  Thus  the  Reed-Solomon 
code  can  be  decoded  directly  from  the  remainder  polynomial, 
bypassing  the  need  for  the  syndrome  polynomial. 

The  relationship  of  this  algorithm  to  the  usual  form  of  the  Berle¬ 
kamp- Welch  algorithm  and  the  implications  of  this  form  of  de¬ 
coding  in  terms  of  implementation  are  discussed  and  examples 
are  given. 
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Summary 

Consider  an  (n,  k)  Reed-Solomon  (RS)  code  of  length  n  = 
and  redundancy  t  =  n  -  k  over  the  finite  field  GFf?).  The  usual 
implementation  of  the  RS  encoder  consists  of  an  r  stage  feedback 
shift  register  [1],  In  some  very  high  speed  applications,  the  presence 
of  the  accompanying  global  feedback  path  restricts  the  speed  of  the 
encoder. 

Recently,  Seroussi  has  proposed  an  architecture  for  the  RS  en- 
coder  [2],  Unlike  the  usual  implementation  of  the  RS  encoder, 
Seroussi’s  architecture  does  not  require  any  global  feedback  path. 
Furthermore,  the  architecture  is  of  systolic  type  and  has  modular  a 
structure-  it  consists  of  one  pre-processing  cell  and  r  Cauchy  cells 
[2],  This  modularity  feature  of  the  encoder  makes  it  suitable  for 
hardware  implementation. 

The  circuit  complexity  of  Seroussi's  RS  encoder  depends  essen¬ 
tially  on  the  Cauchy  cells.  Each  Cauchy  cell  computes  one  parity 
symbol  for  the  RS  code  and  contains  one  parallel  type  divider  for 
the  finite  field  GF(q),  Unfortunately,  the  realization  of  a  divider  is 
much  more  complicated  than  that  of  a  multiplier  [3].  Let  M  de¬ 
note  the  circuit  complexity  of  a  parallel  type  multiplier  of  GFfq), 
where  q  =  p”',  p  is  prime  and  m  is  a  nonzero  positive  integer.  Then 
the  circuit  complexity  of  a  modular  parallel  divider  is,  in  general, 
O(mM)  and  that  of  Seroussi’s  RS  encoder  is  OirmM). 

In  this  paper,  we  extent  Seroussi’s  work.  It  is  shown  here  that 
the  Cauchy  cell  can  be  implemented  without  any  divider.  The 
proposed  Cauchy  cell  also  has  a  shorter  logic  path  and  yields  an 
RS  encoder  which  has  a  circuit  complexity  O(rM). 

Let  a,  o^.  ■■  ■.a''  '  be  the  roots  of  the  RS  code  where  a  is 
a  primitive  element  of  GF(q).  In  systematic  form,  the  generator 
matrix  of  the  RS  code  can  be  written  as 

G  =  [I|AJ, 

where  I  is  the  k  x  k  identity  matrix  and  A  is  a  k  x  r  matrix  where 
the  matrix  elements  belong  to  GF(q).  A  is  called  a  Cauchy  matrix 
and  its  elements  are  given  as  follows  [2]: 

^ (1) 

where 

=  -Of"~'-',  0<i<k-l, 

Hj  =  0<j<r-l, 


“■  =  — n 
0  <  /  <  * 

=  n  (a"-’-*--’ -q"-*-'),  0<j  <r-l. 


Consider  the  codeword  (do>  di,  dt-i,  po,  Pi.  Pr-i)- 

where  d,  (i  =  0,  1,  •  •  -  ,  k  -  1)  and  p,  (i  =  0,  1,  •  •  • ,  r  -  1)  are  the 

data  and  parity  symbols,  respectively.  For  0<:<k-l,  0  <  j  < 
r  -  1,  let 

=  {x,  +  yj)B,j.  (2) 

Ri-t-hj  =  +  y})R<.j  +  d|Ui  (3) 

with  Bq.j  =  1  and  Ro.j  =  0.  Then  it  can  be  shown  that 

Pj  =  Rk.j’  ;■  =  0,  1,  ■  •  -  ,  r  -  1.  (4) 

As  the  computation  of  R,+i,j  (0  <  i  <  k  -  1,  0  <  j  <  r  -  1) 


requires  only  multiplication  and  addition  operations,  the  Cauchy 
cells,  each  of  which  computes  one  parity  symbol,  can  be  designed 
without  any  divider/inverter. 

The  computation  of  d,u,  in  (3)  is  done  in  the  pre-processing  cell 
of  the  RS  encoder,  and  one  Cauchy  cell  can  recursively  compute 
one  parity  symbol  with  two  multiplications  and  two  additions  in 
each  time  step.  However,  the  multiplications  can  be  performed  in 
parallel  resulting  in  a  logic  path  consisting  of  one  multiplier  and  two 
adders.  This  logic  path  is  shorter  than  that  of  [2]  which  consists  of 
one  divider  and  two  adders. 
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Abstract 

We  describe. a  new  algorithm  based  on  Grobner  bases  of  modules 
for  solving  for  the  pair  (a,b)  the  multivariable  polynomial  congruence 
os  =  &  mod  /  where  /  is  an  ideal  in  i:{zi , . .  •  i  Zn]  and  s  is  given.  The 
restriction  to  one  variable  gives  a  new  approach  to  decoding  BCH  and 
(classical)  Goppa  codes. 

1  Introduction 

Let  A  =  k[xi, . . . ,  In]  where  Ic  is  a  field.  Interpreting  a  polynomial  s  of  total 
degree  m  as  a  truncation  of  a  formal  power  series  we  may  ask  for  relatively 
prime  polynomials  a,b  where  n(0)  5^  0  such  that  the  expansion  of  b/a  as 
far  as  terms  of  degree  ni  is  equal  to  s.  This  problem  may  be  regarded  as  a 
special  case  of  solving  (for  a  and  6)  the  congruence 

as  =  6  mod  /,  ( 1 ) 

where  /  is  an  ideal  in  A  and  s  is  a  given  polynomial  In  general  ue  re<|iiire 
that  a  and  b  be  relatively  prime  hut  drop  the  condition  that  n(0)  0. 

In  the  I-variable  problem,  /  is  the  itieal  generated  by  a  single  polynomial 
and  it  is  well  known  that  in  this  case  the  solution  may  he  determined  us¬ 
ing  the  extended  Ruclidean  algorithm  or  the  Ilerh*kamp-M.'issey  aigoriihin. 
neither  of  these  is  valid  for  n  >  1.  Sakata  (3)  has  given  an  extension  to 
n  variables  of  the  Herlekamp-Massey  algontimi  and  in  (2|  we  gave  a  <  or 
responding  generalization  of  the  motliod  based  on  tin*  extemled  I'uclidean 
algorithm.  The  natural  context  for  siic/i  a  generalization  is  that  of  f.Vdhnrr 
bases  of  polynomial  modules.  W'c  proved  in  (’i)  that  (irohner  bases  of  mod¬ 
ules  can  he  used  to  soIn*'  •  ongriience  (()  for  any  id«‘al  /. 

'I'he  1-variahle  technique  can  he  applied  to  the  decoding  problem  for 
BCH  and  (i-variahle)  Goppa  codes  wliere  it  provides  a  new  theoretical 
derivation  of  an  algorithm  for  solving  tlie  key  e(|uation  which  m  practice  is 
equivalent  to  tlial  based  on  tlie  extended  riiciidean  algorithm.  'Hiis  new 
theoretical  underpinning  is  in  a  sense  more  “natmar’  than  «-iiher  of  the  two 
classical  methods.  In  the  following  sections  we  outline  a  direct  <lerivation 
of  this  algorithm.  For  further  details  rf.  [1], 

2  The  solution  module 

The  I-variable  form  of  congruence  (!)  is 

as  =  I)  mod  //.  (2) 

where  .<?  (the  syndrome  polynomial)  ha.s  ch‘';n'r>  at  most  2l  —  1  and  7  is  a 
polynoiiiia!  of  degree  21  (1^'  or  the  (Joppa  polynomial).  Die  set  of  all  pairs 
(a,  6).  without  restriction,  satisfying  (2)  forms  an  A-modnle  (/.  Any  such 
module  ha.s  a  finite  basi.s  and  it  is  ea.sy  to  prove  that  (i  =  {(H.  </)•  ( 1  •  >)} 
is  a  ba.si.s  of  G.  Our  aim  is  to  determine  from  It  anolln'r  liasis  IV  which 
contains  (a  scalar  multiple  of)  the  specific  required  solution  (^.w)  in  the 
usual  notation-  where  <t{U)  =  1. 

We  impose  a  total  order  <  on  the  .set  of  P).(0.i'^)}  in  by 

interleaving  them  a.s  follows: 

(1.0)  <  (0,  1)  <  (tA))  <  (0.J-)  <  (J•^ll)  <  (o.j-*)  <  ••• 


This  ensures  that  (^^,0)  >  (0,x^)  if  and  only  if  p  >  t/.  Each  pair  (n,6) 
can  be  expressed  as  a  finite  sum  where  €  k  and  tj  is  a  term  an<l 

the  icading  Unu  of  a  pair  {a,b)  is  that  term  in  the  decompositioji  wliicli  is 
greatest  relative  to  <.  If  the  leading  term  of  {a.b)  has  the  form  (x'‘,())  we 
say  its  leading  term  is  on  the  left  while  if  it  has  the  form  (O.!*")  tJien  ii.s 
leading  term  is  on  the  rtghi.  Note  that  the  leading  term  of  (a, 6)  is  on  the 
left  if  and  only  if  So  >  6b  (where  6  denol<'s  degree). 

A  pair  (a,b)  can  he  reduced  by  a  pair  {a'.b‘)  if  the  leading  term  of 
(a'. 6')  is  on  the  left  and  6a  >  6a'.  or  if  the  leading  term  of  la'.l''] 
the  right  and  6b  >  6b' .  The  rtducltou  step  will  he  defined  with  reference  to 
the  left  hand  side — an  analogous  definition  applhrs  on  the  right  .Snfj/HXv 
a  =  aiz‘  +  •  •  •  +  no.  o'  =  +  •••  +  00  where  n/  0.  n(„  ^  h.  /  >  m  ami 

the  leading  term  of  (n^.A')  is  on  the  left.  7'lien  we  say  (a.b)  Is  reduced  bg 
(o',  6')  /o  (a”,  6")  =  (fl.  6)  —  (ni/n|„  )i'^""’ (u^  I'T  I'  is  i  lear  I  hat  bu"  <  bit 

VVe  call  a  basis  D  of  a  module  ,U  a  reduced  basis  if  noin*  of  il.s  idemeni  v 
can  be  reduced  by  any  other 

Th<H)rom  2.1  (1)  id  M  bt  a  modult  Then  11  uduml  basis  of  M  ton- 

stsis  either  of  a  single  clciiirnt  or  of  tuo  elernenfs  {(nj .  I»i).  (uj.  irbtn 

(at.  61)  has  leading  ienn  on  me  left  ana  (02. 62)  has  leading  In  w  on  IIh 

right.  Moreover,  tn  the  latter  case  6a\  >  602  and  bbi  <  <‘>^2 

I'li^  Let  D  he  a  reduced  basis  of  .\f  and  td  (u.A)  €  M  Then  tin  hadniif 

term  of  (0.6)  is  a  multiple  of  the  leading  term  of  an  elcmnil  of  D 

(in)  A  reduced  basis  of  M  is  <1  Crohnn  basi.s. 

Since  we  arc  intcr»‘sted  in  the  module  G  arising  in  the  deccHling  appli¬ 
cation  we  may  as.sumc  that  G  contain.s  an  elemiuil  (it,-.)  where  a  and  -• 
are  relatively  prime  and  bet  <  In  particular  ihe  leading  term 

of  (<r,w)  is  on  the  left.  (We  ti.se  ihe  condilion  er{{))  =  1  in  Step  2  of  tin- 
algorithm  below.) 

Th<H>roiii  2.2  (fT,w)  is  nn  rleminl  of  hasl  leading  term  in  G  I'vfig  ri - 
duced  basts  of  G  a^ntains  a  srninr  mnitipli  n/{fT, 

As  a  coJ).seque!ir<‘  of  this  theorem  we  have  the  roll«‘wm,u,  a!g\>rillim  lur 
solving  the  key  oqtiation 

Algoritlnii 

I-  Reduce  the  basis  H  —  {((>.;/).(  1 .  s)}  to  a  rediired  ha.sis  /#'  =: 

{(«!.  t’l)- (”2- r2)}  where  (ui.iq)  has  li'admg  ttTiii  on  the  left 

2.  Sci  (.T.w)  =  ii) 

References 

(Ij  P.  Fitzpatrick.  A  new  derivation  of  an  algorithm  for  soomg  the  key 
equation,  (submitted  for  puhliralion) 

[2]  P.  Fitzpatrick,  J.  Flynn.  A  Grobner  ha.sis  technique  for  Pade  approxi¬ 
mation.  J.  Symholir  C emiputalion.  Ill  (llh)2),  11111-1118. 

[3]  S-  Sakata.  Extension  of  the  Berlekamp-Mass«'y  algorithiti  to  n  dimen¬ 
sions,  Information  and  ('ompiilaliou.  8-1  (lb90)  207-21111 


98 


ON  THE  MINIMUM  CODE  LENGTH  OF5-STEP  (T,  U) 
PERMUTATION  DECODABLE  CYCLIC  CODES 


Anader  Benyamin-Seeyar,  Tho  Le-Ngoc,  and  Ming  Jia 
Department  of  Electrical  and  Computer  Engineering,  Concordia  University, 
1455  de  Maisonneuve  West,  Montreal,  Quebec,  Canada  H3G  1M8 


Summary 

One  variation  of  error-trapping  decoding  which  is  known  as  “permu¬ 
tation  decoding"  was  introduced  by  Prangell]-  A  serial  decoder  based  on 
this  treatment  was  given  by  MacWiliiams,  who  made  use  of  code  preserv¬ 
ing  (T,  U)  permutation  sets  to  obtain  k  error-free  positions  from  which  the 
rest  of  the  codeword  could  be  reconstructed  [2].  Recently,  the  exact  lower 
bounds  on  the  code  length  n  for  two  and  three  steps  (T,  U)  permutation 
decodable  (PD)  (n,  t,  2t+l)  cyclic  codes  have  been  found  [3-5].  In  this 
paper,  we  extend  those  results  on  the  code  length  for  any  s-step  (T,  U) 
permutation  decodable  cyclic  codes  with  odd  valued  error-correcting 
capability  / .  In  addition,  an  optimum  permutation  step  which  makes  the 
most  efficient  improvement  in  the  code  rate  of  PD  cyclic  code  is  also 
given.  Since  the  derivation  of  these  results  involves  only  the  error  posi¬ 
tion,  the  results  are  ^plicable  to  cyclic  codes  over  GF(2"). 

Main  Results 

The  exact  lower  bounds  on  the  code  length  n  for  two  and  three  step 
(T,  U)  permutation  decodable  (n,  k,  2i  +  1)  cyclic  codes  are  given  in  [3, 
4].  Here  we  extend  the  results  on  the  code  length  for  s  -step  (T.U)  permu¬ 
tation  decodable  cyclic  codes  with  odd-valued  t  only.  The  results  are 
presented  in  the  form  of  theorems.  Note  that  the  subsctips  "e"  and  "o"  are 
used  in  the  manuscript  to  indicate  the  even  and  odd  valued  variables 
respectively. 

Theorem  1:  The  (n,k,,2r^  +  l)  codes  with 

-2’'' +  l)  +  2’‘‘,  S/^(2''^-l)  + 1  for  /„=5  and 

k,it^(2’  ^-l)-2'’^  +  3  fortes?,  ares-stepPD. 

Theorem  2;  The  (n,  2f^  +  1)  codes  with 

n=tJk^-2’-\2)^2’-\  t,sr„(2”^-l)  for  =  5  and 
kg^t^(2’  ^-l)-2'"^  +  2  for/„>7,  arej-stepPD. 


Theorem  3:  The  (n,  k^, 

2t„  +  1) 

codes  with 

"  =/„(*„- 2’ ■' -I- 2) -f  2'’'.  k„=t„(2’ 

-'-l)-2'-' 

for  /„  2  7,  ate 

not  J  -step  PD. 

Corollary:  The  (n,  i,. 

2t„  +  I) 

codes  with 

"  ='<,«^,-2'''  +  l)  +  2'''.  k„  =1^(2’ 

-^-l)-2’-^ 

for  /„  2  7,  are 

nets -step  PD. 

Now  we  present  the  bounds  on  the  code  rate  for  binary  (n,k,2t  +  1) 
permutation  decodable  cyclic  codes. 


Theorem  4:  The  code  rate  of  s-step  PD  codes  with  k  >  k  and  =  5  is 
less  than  !/(/_,  -  2  +  2/r^),  where  k’ =  t^(2’ -  \)-2  and  -  1. 

Theorem  5:  The  code  rate  of  j-step  PD  codes  with  k  >  k‘  and 
t^>l  is  less  then  l/(r„-2),  where  IrJ  =  r„(2’"^- 1)-2’'^  and 
**  =  *<.  +  1  ■ 

The  above  results  present  the  bounds  on  the  code  length  and  the 
code  rate  of  s  -step  PD  codes.  Clearly,  when  permutation  steps  i  increases, 
R^  increases.  Next,  we  show  how  R^  increases  with  respect  to  s . 


sider  the  case  of  r,  =  7  as  an  example  (the  case  of  r„  =  3  and  5  are  simi¬ 
lar),  for  s  -step  permutation, 

**  =  t„(2’“^-l)-2"'  (7) 

=  =  +  (8) 

•  • 

Therefore,  when  each  more  permutation  is  applied,  k  increases  k  +  r„ . 

• 

On  the  other  hand,  when  k  <  k  ,  the  code  length  n  does  not  keep 
decreaseing  at  the  rate  of  An  =  (AI:)r,  so  the  increasing  in  the  code  rate 
R^  is  very  slow.  That  is  to  say,  A/?„  increases  as  permutation  steps  i 
increases,  until  to  this  extent  that  k  large  enough  and  k  <k  being 
satisfied.  The  step  corresponding  to  the  largest  AR^  is  the  optimum  per¬ 
mutation  step  which  makes  the  most  efficient  improvement  in  the  code  rate 
of  PD  codes.  When  the  error  correcting  capability  of  a  code  is  goven  as 
,  k  corresponding  to  every  permutation  steps  s  is  known;  therefore,  the 
optimum  permutation  step  can  be  estimated.  From  the  implementation 
point  of  view,  this  optimum  step  determines  the  number  of  U  -  perumtation 
steps  used  for  achieving  higher  code  rate  R^ . 

From  the  results  above,  the  relation  between  R^ ,  k  and  s  as  shown  in 
Figure  1  can  be  obtained,  where  R  =  l/(t^  +  2  +  2/r^)  and  R  =  l/(f„  -2) 
for  r„  =  5  and  r,  ^  7  respectively.  For  the  region  of  t  <t  ,  the  code  rate  of 
PD  codes  is  around  R  . 


Mt  1 

I 

I 

0  - ► 

k  k 

Figure  1;  The  relation  between  and  s 

As  cotKlusions,  the  following  comments  can  be  made: 

1.  When  *-»«>,  the  code  rate  of  s-step  PD  codes  decreases  and 
approaches  the  code  rate  of  the  codes  which  can  be  decoded  by  error- 
trapping  decoding. 

2.  Whent-»t^,E^  approaches  to  l/(r„ -2  +  2/r„)  and  l/(r„-2)for 
r^  =  5  and  r^  >  7  respectively. 

3.  For  each  additional  step,  k  increases  mote  than  double.  In  the 
region  of  it  <  i  ,  the  code  length  n  does  not  keep  decreasing  at  the  rate  of 
An  =  (Ak\,  so  the  increasing  in  the  code  rate  R^  is  very  slow  and  R^  is 

around  the  value  R  . 

4.  AR=R...-R  increases  as  the  permutation  steps  s 

increases  until  k  is  large  enough  and  k  <k  being  satisfied.  Therefore, 
there  existes  an  optimal  step  which  makes  the  most  improvement  in  the 
code  rate  of  PD  codes.  Given  a  /-error  correcting  (n,  i,  2/  -t-  1)  cyclic 
code,  then  the  optimum  step  s  can  be  determined.  In  this  way  the  steps 
needed  to  decode  a  certain  c^e  can  be  estimated. 


Suppose  that  the  code  rate  of  a  s-step  PD  is  then 
AE„  =  R, 
follows: 


This  can  shown  as 

(I) 

K  k„2‘'\-\)ln2 

(21 

- = - >0 

(5) 

ds 

13] 

+2)r„(r„-l)2’-V2 

-  ^  r> 

3 

(6) 

14] 

ds  n 

« 

So  when  k  >  k  ,  AR^^  increases  as  the  permutation  steps  increases.  Con¬ 
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Abstract 

Many  digital  communication  channels  are  affected  by 
errors  that  tend  to  occur  in  bursts.  This  paper  proposes 
two  new  schemes  for  burst  error  correction.  Both  schemes 
employ  a  combination  of  two  codes.  In  the  first  scheme, 
one  of  the  codes  is  used  for  random  error  correction  and 
for  burst  detection  while  the  other  one  is  used  only  for  burst 
recovery.  In  the  second  scheme,  one  of  the  codes  is  used 
for  burst  detection  and  for  channel  state  estimation.  With 
the  second  scheme,  both  codes  are  used  for  error  correction. 
Unlike  existing  burst-error-correcting  schemes,  it  is  shown 
that  the  proposed  schemes  are  adaptive  to  channel  conditions 
and  less  sensitive  to  errors  in  the  guard  space.  For  the  same 
delay,  the  proposed  schemes  offer  better  performance  than  the 
interleaving  schemes.  When  the  channel  is  heavily  corrupted 
by  bursts,  the  improvement  is  even  more  pronounced. 

Summary 

Many  digital  communication  channels  are  affected  by 
errors  that  tend  to  occur  in  clusters  or  bursts  [1].  Several 
schemes  for  burst  error  correction  on  these  channels  have 
been  reported  [2-5].  One  approach  is  to  use  special  codes 
designed  exclusively  for  burst  error  correction  [4].  These 
so-called  burst-error-correcdng  codes  perform  relatively  well 
over  channels  with  short  bursts,  but  perform  poorly  when 
the  channels  are  corrupted  with  long  bursts.  Another  con¬ 
ventional  approach  is  to  interleave  channel  symbols  prior  to 
transmission.  With  interleaving,  burst  errors  are  spread  over 
many  symbols,  and  can  thus  be  viewed  as  random  errors. 
However,  for  channels  with  long  bursts,  interleaving  schemes 
need  extremely  long  delay  to  be  effective,  which  might  not 
be  tolerated  in  some  applications.  Another  approach  is  Gal- 
lager’s  burst-finding  scheme  [2].  In  this  scheme,  a  rate  1/2 
systematic  convolutional  code  is  used  with  a  modified  ma¬ 
jority  logic  decoding.  Gallager’s  scheme  sacrifices  random- 
error-coirecting  capability  in  exchange  for  better  burst  cor¬ 
rection.  A  modified  version  of  this  scheme  was  recently 
suggested  by  Schlegel  and  Herro  (Sj.  This  scheme  is  es¬ 
sentially  the  same  as  Gallager's  scheme  except  that  majority 
logic  decoding  is  replaced  by  a  modified  Viterbi  decoding  al¬ 
gorithm.  Both  Gallager's  burst-finding  scheme  and  Schlegel 
and  Herro’s  scheme  are  extremely  sensitive  to  errors  in  the 
guard  space. 

7\vo  efficient  coding  and  decoding  strategies  are  pro¬ 
posed  in  this  paper.  Both  schemes  employ  a  combination 
of  two  punctured  convolutional  codes  and  a  burst  detection 
procedure.  Burst  detection  is  accomplished  by  observing  the 
increment  in  the  cumulative  path  metrics  from  Viterbi  decod¬ 
ing.  Scheme  1  uses  two  punctured  convolutional  codes  with 

*  Thit  research  was  supported  by  the  National  Sciences  and  Engineering 
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different  memories.  In  this  scheme,  a  code  with  a  relatively 
short  memory  is  used  with  Viterbi  decoding  for  random  er¬ 
ror  correction  and  for  burst  detection.  The  other  code  which 
has  a  much  longer  memory  is  used  with  backward  sequen¬ 
tial  decoding  to  recover  burst  errors.  Normally,  the  decoder 
operates  in  the  random  mode  and  it  uses  the  received  se¬ 
quence  corresponding  to  the  code  with  the  shorter  memory. 
An  abrupt  increase  in  the  cumulative  path  metrics  indicates 
that  the  channel  is  most  likely  in  a  burst.  The  decoder  then 
switches  from  the  random  mode  to  the  burst  mode,  and  starts 
burst  error  recovery.  In  the  burst  mode,  starting  from  a  chosen 
state,  the  decoder  employs  a  backward  sequential  decoding 
algorithm  to  recover  the  corrupted  data.  When  the  channel 
becomes  quiet,  the  path  metrics  are  relatively  constant,  and 
the  decoder  returns  to  the  random  mode. 

Scheme  2  uses  two  punctured  codes  that  are  derived 
from  the  same  original  convolutional  code  with  complemen¬ 
tary  perforation  patterns.  One  code  sequence  is  transmitted 
after  a  delay  from  the  transmission  of  the  other  code  sequence. 
The  first  code  sequence  is  used  with  a  Viterbi  decoder  to  de¬ 
tect  bursts  using  the  same  burst  detection  procedure  as  in 
Scheme  1.  The  burst  detection  procedure  also  serves  for  es¬ 
timating  the  channel  state.  Both  received  sequences  are  then 
used  by  a  second  Viterbi  decoder  which  uses  the  channel  state 
information  provided  by  the  first  decoder. 

The  proposed  schemes  are  adaptive  to  channel  condi¬ 
tions.  The  parameters  of  the  decoders  can  be  chosen  to  op¬ 
timize  the  perfoimance  of  the  schemes.  For  the  same  de¬ 
lay,  these  schemes  ouqjerform  the  conventional  interleaving 
schemes  when  the  channel  is  heavily  corrupted  by  bursts. 
While  Gallager’s  burst-finding  scheme  and  Schlegel  and 
Herro’s  scheme  are  sensitive  to  errors  in  the  guard  space,  the 
proposed  schemes  can  tolerate  high  error  rates  in  the  guard 
space. 
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Erasure-and-error  decoding  is  a  general  form  of  decoding  for 
reliable  communications  and,  at  the  same  time,  the  basis  of 
important  channel  coding  schemes  such  as  coded  (or  hybrid) 
ARQ  and  concatenated  coding.  There  are  several  schemes  dis¬ 
cussed  in  the  context  of  information  theory.  Those  are  Forney’s 
scheme,  the  threshold  decoding  discussed  in  Gallager’s  textbook, 
the  likelihood-ratio  decision,  the  use  of  error-detecting  codes,  and 
their  modifications.  Most  of  the  schemes  may  be  described,  in 
terms  of  a  reliability  measure  Q{y,'m),  in  such  a  manner  that  the 
decoded  message  is  accepted  only  if  Q{ni)  >  T  for  a  specified  T 
and  an  erasure  is  declared  otherwise.  For  example,  Q(y,m)  — 

scheme,  Q{y,m)  =  log 

in  the  threshold  decoding,  and  Q(y,m)  =  log  .) 

in  the  likelihood-ratio  decision.  An  interesting  variation  of  the 
likelihood-ratio  decision  is  Kudryashov’s  scheme  where  Q(y,m)  = 

log - is  used.  Between  these  schemes.  For- 

ney’s  scheme  is  known  to  be  the  best  one  but  requires  much  com¬ 
putation.  Thus,  suboptimal,  but  simpler  schemes  are  preferred 
in  real  applications.  However,  when  we  are  to  select  one  between 
these  schemes,  we  realize  that  there  do  not  exist  enough  discussions 
on  the  relationship  between  the  respective  performances. 

In  this  paper,  we  consider  the  upper  bounds  of  erasure 
probability,  and  P„e,,  undetected  error  probability,  of  the  respec¬ 
tive  schemes  and  compare  them  in  a  systematic  manner.  Known 
bounds  for  the  respective  schemes  are  insufficient  for  our  purpose. 
For  example,  most  of  them  are  presented  in  terms  of  different  expo¬ 
nent  functions,  the  performance  is  under-estimated  in  the  case  of 
the  likelihood-ratio  decision,  there  are  some  confusions  concerning 
to  the  analysis  of  threshold  decoding  as  seen  in  Gallager’s  text¬ 
book,  and  the  performance  bound  of  the  scheme  based  on  error- 
detecting  codes  has  not  been  considered  in  the  Shannon-theoretic 
context.  A  reason  for  some  of  them  is  that  the  performance  of 
an  erasure-and-error  decoding  scheme  is  frequently  considered  in 


terms  of  the  asymptotic  error  exponent  as  the  blocklength  goes  to 
infinity  and,  at  the  same  time,  the  exponent  of  the  erasure  prob¬ 
ability  goes  to  zero.  Even  though  this  gives  the  best  theoretically 
attainable  error  exponent,  it  usually  has  nothing  to  do  with  the 
observable  performance  because  of  the  decoder  complexity. 

We  consider  the  performance  bound  of  a  given  scheme  in  the 
following  form  (or  its  minor  variation)  such  that 

Puer  <  exp{~  N R„)} 

for  Ra  satisfying 

Per,  <  exp{-N£,p(f?„)}, 

where  R  is  the  coding  rate  and  N  is  the  block  length.  We  call 
Etcheme  tbc  error  exponent  of  the  scheme.  This  form  of  the  bound 
is  really  required  in  many  applications  since  the  tradeoff  between 
Per,  and  Puer  is  an  important  problem. 

In  the  discussion,  we  carefully  distinguish  several  threshold  de¬ 
coding  schemes.  Although  the  same  Q(y,m)  is  used,  there  are  the 
weak  threshold  (Th)  decision,  the  strong  threshold  (STh)  decision 
where  the  uniqueness  of  the  decision  is  required,  and  the  max¬ 
imum  likelihood  threshold  (MlTh)  decision  where  the  maximum 
likelihood  m  is  tested.  These  show  somewhat  different  character¬ 
istics  and  performances. 

As  a  result  of  analysis,  we  show  that  ETh{R,Po)  < 
EsTh.[R,Ro),  that  EsTh{R,Ro),  EMiTh{RiRo),  and  EEi(R,Ru) 
for  the  scheme  based  on  error-detecting  codes,  have  almost  the 
same  bound,  and  that  EfarneyiRt  Ro)  is  strictly  superior  to  the 
rest.  An  interesting  point  is,  for  a  binary  symmetric  channel,  that 
ELr{R,  Ru)  for  the  likelihood-ratio  decision  and  EKuJrya,hou{R,  Ro) 
are  very  close  to  EFumey{R,  Ro)-  A  reason  of  the  last  result  may  be 
that  Q{y,m)  is  basically  the  ratio  of  likelihood  functions  in  these 
schemes. 
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Abstract 

Suppose  one  is  given  M  (possibly  corrupted)  codewords  from  M  (pos¬ 
sibly  different)  codes,  each  over  F,;  suppose  further  that  the  code¬ 
words  have  a  single  symbol  in  common.  The  common-symbol  decoding 
problem  is  that  of  estimating  the  symbol  in  the  common  position.  In 
[1],  a  solution  to  this  problem  was  presented  for  a  very  restricted  case. 
This  talk  presents  a  general  solution  that  contains  the  familiar  one- 
step  m£(jority-logic  decoding  as  a  special  case.  This  algorithm  leads 
naturally  to  a  decoder  structure  suitable  for  fault-tolerant  decoding 
of  cyclic  block  codes;  the  resulting  architecture  undergoes  graceful 
degradation  with  increasing  component  failures.  Bounds  on  decoder 
performance  under  various  kinds  of  partial  decoder  failures  are  pre¬ 
sented. 

Summary 

One-step  m^ority-logic  decoding  [2],  is  one  of  the  simplest  al¬ 
gorithms  for  decoding  cyclic  block  codes.  However,  it  is  an  effective 
decoding  scheme  for  very  few  codes.  In  [1],  the  authors  presented  a 
generalization  based  on  the  following  common-symbol  decoding  ap¬ 
proach. 

Given  Ci ,  (^j , . . . ,  Cm  a  set  of  M  linear  block  codes  over  F,,  let 
Cl  €  C\,C2  6  C2.  ■ . .  ,CM  G  Cm  be  a  set  of  codewords  with  the 
first  symbol  in  common — i.e.,  Ci,i  =  cj  i  =••■=:  c.vj.i.  The  sym¬ 
bols  making  up  the  codewords  are  transmitted  over  a  channel  (the 
common-symbol  only  once),  and  errors  occur.  Let  r,  =  c,  -t-  e,  V 1  = 
1. 2, . . . ,  M  be  the  received,  corrupted  codewords.  (Of  course,  I'l  j  = 
7-.J  1  =  • .  •  =  rw.i.) 


Fig  1.  M  codewords  sharing  a  symbol  being  transmitted  over  a  channel. 

It  was  shown  in  [3]  that  the  common-symbol  can  be  estimated 
correctly  provided  no  more  than  [(5—1  )/2J  errors  occur  in  all  the  M 
codewords.  Furthermore,  if  5 is  even,  and  5/2  errors  occur,  then  such 
an  event  can  be  detected.  Here, 

M 

5=^5;  —  (M  — 1),  where  5;  =  min  dH(*.y)- 

j=i  x,yeC, 

•fi  #  yi 

That  is,  5;  is  the  minimum  Hamming  distance  between  codewords  in 
C,,  that  differ  in  the  first  (common)  position.  The  common-symbol  de¬ 
coding  problem  is  that  of  performing  this  decoding.  (In  [1],  a  solution 
was  provided  for  the  simple  case  of  9  =  2,  Af  =  2,  and  5i  =52.) 

Let  Vi  be  a  bounded-distance  decoder  for  the  code  Ci  defined  as; 
(  argmin  dH(x,ri)  if8,(r,)5^0; 

Pi(r.)  =  ^  *€8i(r,)  where 

( 7  otherwise; 

^Supported  in  part  by  National  Science  Foundation  grant  NCR-89S7623;  also 
by  the  NSF  Engineering  Research  Centers  Program,  CDR-8803012. 


s,(z)  ={y  G  C, :  dH(y.z)  <  ^-5--  •  ordnly.*)  =  y.j/1  2]}. 

That  is,  (i)  if  there  exist  codewords  within  [( 5,  —  1 ) /2J  of  the  received 
n, -tuple  r,,  then  Vi  mapsr,  onto  the  closest  of  them;  (ii)  if  not,  and  5, 
is  even,  then  it  erases  the  first  symbol  and  tries  again;  (iii)  If  both  of 
the  above  fail,  then  the  error  is  uncorrectable  (indicated  by  mapping 
to  7).  Next,  define  the  following; 

(i)  U={i;A(r.)  =  7}; 

(u)  e;  =  r, -P(r,);  rj,  =  c..i;  r,  =wt(e,).  V;  ^  U; 

(iii)  Ea  =  {)  ;  i  ^  U. 23,(r,)  =  a}.  VaeF,; 

A  f  0  a  =  0. 

I  l{*  •  4  is  even,  T;  =  ^/2.  n,  /  a}(  otherwise. 

(v)  A'(a)=^T,  -h  Y.  (5. -T.) 

iGEo  Eo  i€U 

-I- ffo l)wt(a).  Va€F,. 

Common-Symbol  Decoding  Algorithm:  Assume  no  more 
than  [5/2J  errors  have  occurred.  Then,  if  there  exists  a  unique  a  = 
oC  €  F,,  that  minimizes  N(a),  the  error  in  the  common  position  is 
given  correctly  as  o’.  Moreover,  if  there  does  not  exist  such  a  unique 
a",  then  5  is  even,  and  exactly  5/2  errors  have  occurred. 

For  the  case  of  one-step  m^ority-logic  decoding,  the  J  orthogo¬ 
nal  parity-checks  correspond  to  simple  parity-check  codes  with  6;  = 
2,  V  i  =  1, 2, . . . ,  J,  and  the  algorithm  reduces  to  the  familiar  one- 
step  migority-logic  decoding. 

The  common-symbol  decoding  problem  may  be  applied  to  decod¬ 
ing  cyclic  block  codes  because  of  the  following  observation;  For  many 
large,  powerftil  codes,  it  is  possible  to  break  up  any  codeword  from  the 
large  code  into  codewords  (from  smaUer,  weaker  codes)  that  share 
a  single  symbol  in  common — just  as  one-step  msgority-logic  decoding 
may  be  viewed  as  breaking  up  a  codeword  into  a  number  of  codewords, 
each  from  a  simple  parity-check  code,  with  a  single  symbol  in  common. 

This  suggests  the  following  distributed  approach  to  decoding 
cyclic  codes;  (i)  Break  the  received  (possibly  corrupted)  codeword 
from  the  powerful  code  into  \I  (possibly  corrupted)  codewords  from 
smaller  codes;  (ii)  decode  these  (smaller)  codewords  in  parallel;  (iii) 
pool  the  results  of  the  individual  decoders  ( r],’s,  r,’s,  and  U)  to  decode 
the  symbol  in  the  common  position;  (iv)  repeat  with  cyclic  shifts  to 
decode  all  symbols  of  the  code. 

Fault-Tolerant  Decoding:  Suppose,  of  the  M  decoders,  the 
ith  decoder  were  to  fail  (i.e.  produces  unreliable  values  for  r),  and 
T,).  Then  if  this  fact  is  known,  simply  ignoring  the  result  of  the  ith 
decoder  allows  us  to  correct  the  common  symbol  provided  no  more 
than  [(5  —  5,  )/2J  errors  have  occurred.  However,  if  this  fact  is  not 
known,  then  only  [(5  —  1  )/2J  -  5,  errors  can  be  corrected.  Further, 
if  I),  is  reUable,  but  T,  is  not  (and  the  decoder  is  unaware  of  this)  then 
[(5  —  6,  —  1  )/2j  errors  can  be  corrected. 
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ABSTRACT 

Recently,  it  was  shown  how  to  determine  the  error  locator  polynomial  of 
a  primitive,  binary  t-error  correaing  BCH  code  in  a  single  step  [SJ.  For 
this  purpose  it  is  necessary  to  transform  the  set  oft  syndrome  polynomial 
equations  to  an  equivalent  set  of  polynomial  equations,  leading  to  an 
analytic  expression  for  the  error  locator  polynomial,  a(x).  These  results 
facilitate  decoding  beyond  the  BCH  bound,  i.e.  correcting  more  than  t 
errors.  This  requires  the  resolving  of  additional  syndromes  coefficients, 
which  is  achieved  in  a  simple  and  elegant  way  by  means  of  the  expression 
derived  for  the  syndrome  polynomial  a(Jt). 

SUMMARY 

Primitive,  binary  BCH  codes  are  attractive  because  of  their  relative 
simplicity  and  good  performance.  An  (n,  k)  BCH  code  has  k  information 
bits  per  codeword  of  length  n,  is  defined  over  GFlf."),  with  n  =  2”  - 1, 
and  can  correct  t  =  Ifd  -  l)/2j  where  d  is  the  minimum  distance  of  the 
BCH  code.  Decoding  of  the  received  codeword  requires  three  steps;  (i) 
Computation  of  a  syndrome  vector  whose  2r  components  belong  to 
GF(2"), .  (ii)  Calcul.<tting  an  error  locator  polynomial  of  degree  t  or  less 
over  GF(T").  (ii)  Finding  the  error  locations  by  solving  the  roots  of  the 
error  locator  polynomial . 

In  this  paper  we  are  concerned  with  the  third  step,  which  is  the  time 
consuming  one  most  and  difficult  to  implement  in  hardware. 
Conventionally,  this  requires  the  use  of  the  well-known  Berlekamp- 
Massey  algorithm,  or  the  Euclidean  algorithm  (I).  The  aim  of  this  paper 
is  to  discuss  methods  for  closed  solutions  to  step  (ii),  which  are  easily 
implementable  in  software  and  hardware. 

Consider  a  primitive,  binary  BCH  code  of  length  n  =  2“  - 1  over 
CF(2"),  with  a  be  a  primitive  element  of  the  field.  The  generator 
polynomial,  which  defines  the  code,  has  roots  at«,  o’,  a’,- •,  a®, 
enabling  the  code  to  correct  t  errors.  The  decoder  evaluates  the  received 
codeword  at  o'  to  determine  the  j-th  syndrome  : 

S.  -  r(«')  -  E  ^ y  ^ 

where  r,  is  the  t-th  bit  of  the  received  vector  tif).  It  is  fairly  easy  to 
show  that  only  r  of  the  2t  syndrome  components  are  independent  jl). 
The  error  polynomial  is  given  as: 

e(oi)  •  e,a'  *  e^^  *  ■  ♦  e,a-  =  ; i  •  1 , 2, ... ,  r 

For  binary  BCH  codes  the  error  magnitudes  are  e,  ■=  1.  For  notational 
convenience,  let  x,  =  a‘,  which  leads  to  the  following  system  of  algebraic 
polynomial  equations,  which  we  shall  refer  to  as  F  : 

x,*x^*  -  *x,  -  S, 


For  the  correction  of  one  or  more  errors,  we  must  solve  this  system  of 
algebraic  equations,  to  determine  the  error  locator  polynomial(r(r), 
whose  roots  are  the  required  error  locations  [1]. 

Following  Cooper  [31(41  let  x  =  (x,,  x,, ....  xj  and  K  =  GF{2");  then 
FC  Klx]  is  the  ring  of  polynomials  in  t  variables.  Let  /  be  the  ideal 
generated  by  F.  1{F)  is  spanned  by  F  and  is  the  ideal  of  all  polynomials 
which  vanish  at  a  set  { fj, ...,  of  points  in  K.  By  applying  a  reduction 
process  it  is  possible  to  transform  the  set  F  into  another  set  G,  which  is 
easier  to  solve.  The  resulting  set  C  is  a  triangularixed  set  of  equations, 
which  contains  the  required  error  locator  polynomial  a(x).  [3].  It  is 
noteworthy  that  the  derived  expression  for  the  error  locator  polynomial 
is  independent  of  a  particular  code  or  any  specific  finite  field,  making  the 
result  particularly  useful  for  practical  application. 

When  implemented  carefully,  Buchberger's  algorithm  (2)  has  complexity 
0(t),  else  0{t^),  which  is  still  manageable  for  values  of  r  S  7.  The 
results  for  r  =  2  and  3  are  as  follows  [3]: 

f2  :  etx)  =  x’S,  ♦  xS;  *  5,’  +  S, 

f  =  3  :  o(x)  =x’(5,*j’)  *  x'(S,S,*S')*x{S,*s!S,) 

Deriving  an  analytical  expression  for  the  error  locator  polynomial  has 
several  advantages.  Apart  from  the  obvious  reduction  of  the 
computational  complexity  of  the  decoding  algorithm,  the  analytical 
solution  also  allows  us  to  decode  beyond  the  BCH  bound,  since  it  is 
possible  to  express  a(x)  in  terms  of  the  unknown  syndrome 
coefficient(s).  The  expression  can  then  be  resolved  by  applying  the 
approach  suggested  in  |S]. 
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\\>  coii.-iiilrr  digital  transmission  of  a  memoryless  Gaussian 
source  over  a  slow  fading  Rayleigh  channel.  Our  objective  is  to 
demonstrate  a  useful  application  of  the  multiple  description  scalar 
quantizer  (.MD.SQ)  [1]  and  to  render  comparisons  again.st  a  maxi¬ 
mum  ratio  combiner  (MRC)-based  system  as  well  as  channel  code 
based  systems. 

MDSQ-based  system 

The  MDSQ  is  constructed  for  the  following  idealized  model  for 
a  dual  diversity  system.  Assume  that  two  independent  channels 
are  available  for  transmitting  information  from  a  continuous  al¬ 
phabet.  discrete-time  source.  Each  channel  may  be  in  a  working 
or  non-working  state;  this  is  known  in  the  receiver  but  not  in  the 
transmitter.  \Vlien  working,  each  channel  can  support  a  rate  of 
R  bits/source  sample.  The  encoder  of  an  .V-level  MDSQ  maps  a 
source  sample  to  an  index  pair  Both  indices  are  mapped  to 

/f  bit  codewords,  where  '2^^  >  .V.  The  codeword  corresponding 
to  index  i  {j)  is  then  sent  over  the  first  (second)  channel.  The 
quantizer  is  designed  [1]  so  as  to  minimize  the  mean  squared-error 
(MSE)  when  both  channels  work,  subject  to  constraints  on  the 
.MSE,  when  either  only  the  first  channel  works  or  only  the  second 
channel  works.  Information  theoretic  bounds  on  performance  are 
derived  in  [2],  [3]  for  a  memoryless  Gaussian  source. 

The  MDSQ  is  applied  to  the  Rayleigh  fading  channel  as  follows. 
The  bits  corresponding  to  index  >  are  temporally  separated  from 
those  of  index  j  by  an  interleaveroperating  at  the  /?-bit  word  level 
so  as  to  obtain  two  independently  faded  channels.  Individual  bits 
are  transmitted  over  the  channel  using  a  BPSK  modulator.  Soft 
decision  demodulation  is  u.sed  in  the  receiver  and  it  is  assumed 
that  the  receiver  has  perfect  knowledge  of  the  Rayleigh  fading  pa¬ 
rameter  (channel  state  information  (CSI)).  The  fading  process  is 
assumed  to  be  sufficiently  slow  so  that  the  Rayleigh  parameter 
remains  fixed  over  R  bits.  .-\n  index  is  declared  to  be  reliable  if 
the  corresponding  Rayleigh  parameter  exceeds  a  certain  thresh¬ 
old.  The  threshold  is  optimized  to  maximize  the  output  S.N  R’.  In 
Fig.  1.  we  plot  the  output  SNR  vs.  the  channel  S.NR  for  a  31-level 
-MDSQ  with  R  =  -1.0  bits/sample  (3TMDSQ). 

Reference  Systems 

The  first  reference  system  is  a  maximum  ratio  combiner  (MR(') 
based  dual  diversity  system.  .A  source  sample  is  encoded  by  a 
Lloyd-Max  (LM)  quantizer  to  an  R-hh  codeword.  Each  /?-bit 
(odeword  is  duplicated  and  then  temporally  separated  by  an  in¬ 
terleaver  operatittg  at  the  /f-bit  level  and  transmitted  using  BPSK 
modulation.  .Assuming  |)erfect  CSI  about  the  fading  parameter  is 
available  at  the  receiver,  the  output  of  the  two  channels  are  fed 
to  an  .MRC.  Hie  recovered  bits  are  then  mapped  to  Lloyd-Max 
<|uantizer  reconstruction  levels.  In  order  to  make  an  equal  band¬ 
width  comparison  with  the  .MDSQ  system,  we  consider  a  16-level 
LM  quantizer  and  R  =  1.0  bits/sample.  The  performance  of  the 
.MRC  based  system  is  shown  in  Fig.  1  (16LM-MRC). 

Two  channel  code-based  systems  are  considered.  In  both,  a 
source  sample  is  mapped  by  the  encoder  of  a  32-level  L.M  cpian- 
tizer  to  a  .'i  bit  codeword.  Lwo  consecutive  .A-bit  codewords  are 
concatenated,  a  0  is  appended  and  fed  to  ati  extended  (16.11) 
Hamming  coder.  The  two  systenis  differ  in  the  interleaver  that 
is  used.  In  the  partially  interleaved  system  (PICC).  a  channel 
codeword  is  interleaved  at  the  S-lrit  level.  In  the  fully  interleaved 
system,  interleaving  is  performed  at  the  bit  level.  In  both  systems 
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BPSK  modulation  is  used,  and  perfect  CSI  is  assumed  in  the  soft- 
decision  demodulator/decoder.  The  recovered  bits  are  ma()ped  to 
L.M  reconstruction  levels.  .Note  that  the  bandwidth  requirements 
are  identical  to  the  MRC-  and  MDSQ-based  systems.  Perfor¬ 
mance  results  are  shown  Fig.  1. 

Performance  Results 

First  note  that  the  .MDSQ-based  system  significantly  outjjer- 
forms  the  MRC-based  system.  Improvements  in  output  SNR  at 
high  channel  S.NR's  of  over  -A  dB  are  obtained  without  any  sat  ri- 
flee  in  the  usable  channel  S.NR.  .Also  note  that  the  .MDSQ  system 
significantly  outperforms  the  PICC  system.  The  FICC  system 
does  better  than  all  other  systems.  However,  the  price  paid  is  an 
excessive  increase  in  the  interleaving  delay  for  the  FlC( '  system  as 
compared  to  the  .MDSQ.  MRC  or  PICC  s>  stems.  For  example  con¬ 
sider  a  mobile  radio  moving  at  30  mph.  If  we  assume  a  temynsral 
separation  of  Ams  to  obtain  independent  fades  [Al.  the  end-tu-end 
delay  for  the  .MRC.  .MDSQ  and  PICC  systems  is  roughly  A  m-. 
However,  for  the  FICC  system  the  corresponding  delay  is  roughly 
75  ms  since  interleaving  is  performed  at  the  bit  level.  Llms  the 
MDSQ  system  improves  the  performance  over  an  MRC  system 
without  increase  in  the  interleaving  delay.  This  is  imitortani  in 
two-way  speech  communication  .systems  where  delay  must  be  al¬ 
located  between  the  source  coder,  channel  coder  ami  interleaver 
so  as  to  meet  a  delay  budget . 
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1.  General 

The  correlated  Rician  channel  is  a  useful  model  for  a  slowly-fading 
channel,  in  which  the  complex  fading  process  is  composed  of  two 
(piadrature  (lanssian  processes  with  a  given  normalized  autocorrela¬ 
tion  function  p(r).  and  a  corresponding  symmetrical  power  spectral 
density,  for  slow  failing  the  correlation  between  adjacent  symbols  is 
relatively  high  and  might  approach  1.  Single-ray  model,  that  is  flat 
fading,  is  assumed  throughout. 

We  investigate  the  achievable  error  probabilities  over  the  channel, 
employing  coherent  detection  and  ideal  side  information  on  the  real¬ 
ization  of  the  failing  processes  at  the  receiver.  .\n  underlying  decoding 
delay  constraint  which  precludes  the  use  of  (ideal)  interleaving  is  as¬ 
sumed. 

In  [I].  an  u|)per  bound  on  the  error  probability  of  block-  or 
convolutionally-  coded  HI’SK  over  this  channel  (with  similar  assump¬ 
tions  on  the  receiver)  was  presented,  j'he  fading  process  in  |l)  is  as¬ 
sumed  to  be  piecewise  constant  (p.c).  that  is.  considered  to  be  constant 
over  a  symbol's  duration.  We  analyze  the  correlated  fading  channel 
both  with  and  without  the  above  tnentiotied  p.c  approxitttation.  Coded 
lil’SK  perforttiatice.  as  well  as  the  expottettlial  behavior  of  the  error 
probability  are  discitssed.  for  the  cotititittous  chatitiel  it  is  assumed 
that  the  receiver  has  att  ticcttrate  ktiowledge  of  the  sattiple  path  of 
the  realized  fading  process.  We  foetts  ott  obtaitiitig  the  limit  of  ulli- 
ttiate  perforttiatice  and  on  verifying  iiiider  what  conditions  the  above 
mentioned  p.c  approximation  is  adeipiale.  Coitiparisons  to  the  block¬ 
fading  model  are  also  discussed. 

2.  The  Piecewise-Constant  Approximated  Channel 

2.1  Coded  BPSK  performance:  siiccession  of  iijiper  bounds  on 

the  pairwise  error  probability  (/'(c  —  c))  is  reviewed  (based  on  [I]). 

I  wo  conji  1  Inr'  s  of  (Ij  are  rigorously  proved. 

Theorem  for  a  channel  with  p'(r)  >  p(r)  the  performance  is  uni 
fortiily  inferior  in  comparison  to  the  perforiiiance  over  the  channel  with 
p(r).  in  terms  of  the  upper  bound  on  l)ie  pairwise  error  probability. 

Corollary  lor  the  exponentially  correlated  channel  (p(n/)  =  </"), 
among  all  codewords  of  distance  d  from  the  transmitted  one.  the  worst 
upper  bound  on  the  pairui.se  error  probability  happens  when  all  the 
erroneous  (or  different)  syinbols  are  ronseciitive. 

2.2  Exponential  behavior  of  the  error  probability:  A  general 
upper  bound  on  the  average  message  error  probability  (  P, )  for  random 
coding  and  i.i.r/..  (lanssian  inputs  is  presented,  along  with  a  tighter 
bound  for  the  exponiuttially  correlated  channel  where  the  fart  that 


the  fading  process  is  Markovian  is  invoked  [2].  Evaluation  of  the  expo¬ 
nential  behavior  of  F,  for  finite  code  lengths  is  addressed.  It  is  shown 
that  under  stringent  decoding  delay  constraints,  and  a  very  slow  fading 
process  (compared  to  the  user  baud  rate)  reliable  transmission  of  high 
information  rates  is  hard  to  achieve.  However,  under  mild  technical 
conditions,  when  the  decoding  delay  constraint  is  relaxed  and  when 
p(nT)  —  0  for  n  —  "x.  the  classical  Shannon  rapacity  (that  is  the 
ultimate  achievable  rate)  exists  and  it  is  given  by  the  average  mutual 
information. 


3.  The  Continuous  Channel  Model 

I'he  pairwi.se  error  probability  for  coded  HPSK  depends  on  the 
..quared  Euclidean  distance  between  the  two  faded  codewords;  its  eval¬ 
uation  is  pursued  here. 

3.1  Ideal  interleaving:  The  Chernoff  upper  bound  on  F(r  —  r) 
is  determined  in  terms  of  the  Frrdholm  dilfrminant  [3j  (associated 
with  p(r)).  I'he  latter  is  evaluated  based  on  the  repre.sentation  of  the 
fading  processes  by  a  Karhunen-Loeve  expansion.  The  Bhattacharyya 
distance  is  compared  to  the  p.c. -approximated  channel:  the  “inher¬ 
ent  diversity"  embedded  in  the  continuous  mode]  (with  perfect  side 
infortnation )  is  pointed  out. 

3.2  No  interleaving:  Here,  the  tipper  bound  on  P(c  —  r)  depends 
on  the  Fredholm  determinant  for  the  'filtered'  fading  process,  namely 
the  process  multiplied  by  the  window  function  .\(/)  which  is  /  0  over 
the  .symbols  which  do  not  agree  in  rand  c.  When  the  different  .symbols 
are  consecutive,  the  evaluation  is  straightforward.  In  the  general  rase, 
the  Fredholm  determinant  of  a  complex  (laussian  process  filtered  by 
a  time-varying  filter  should  be  evaluated.  To  .solve  for  the  Fredholm 
determinant,  we  use  a  state-space  representation  of  the  system,  and 
Collin.^’  inodificalion  to  the  Riccati  equation  [.3].  and  obtain  for  an 
interesting  example  a  closed-form  solution. 
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Abstract  A  method  is  described  for  obtaining  tight  closed- 
form  bounds  on  the  probabililly  of  error  for  M-ary  differential 
phase-shift  keying  and  Rician  fading  diversity  channels.  The 
channels  e.xhibit  doubly  selective  fading.  In  addition,  the  gains 
of  the  diversity  channels  are  correlated.  As  an  example,  the 
results  of  applying  this  technicpie  are  given  for  16-ary  DPSK 
signaling. 


I.  Summary 

Differential  phase-shift  keying  (DPSK)  mo<lulation  and  diversity 
transmission  have  long  been  employetl  for  communications  over 
fading,  multipath  channels.  In  recent  years,  the  desire  for  high 
bandwidth  efficiency  mid  the  introduction  of  multi-phase  coding 
schemes  have  led  to  consideration  of  M-ary  DPSK  with  Af  >  1. 

Previous  results  [l]  for  differential  binary  PSK  signaling  over 
doubly  selective  fading  diversity  channels  can  he  extended  to  <lif- 
ferential  cpiadriphase  shift  keying.  Performance  of  M-ary  DPSK 
is  analyzed  in  [2]  for  doubly  selective  fading  diversity  channels. 
An  iterative  expression  for  the  error  probability  is  given  for  ar¬ 
bitrary  symbol  .set  size  and  an  arbitrary  order  of  diversity  com¬ 
bining.  The  results  of  [2]  are  applicable  otdy  to  Rayleigh  fading 
diversity  channels  that  are  modeled  by  independent,  identically 
distributed  random  processes.  In  this  paper  we  present  a  method 
for  determining  performance  bounds  when  the  diversity  channels 
exhibit  correlated  Rician  fading. 

It  has  been  shown  [3]  that  for  M-ary  DPSK.  the  bit  error 
probability  can  be  bounded  in  terms  of  the  probability  that  the 
complex-valued  decision  statistic  falls  within  any  of  several  spec¬ 
ified  half-planes  and  (piarter-planes.  However,  for  the  system 
an<l  channels  under  consideration  in  this  paper,  the  probability 
that  the  decision  statistic  falls  within  an  arbitrary  (|uarter-plaiie 
cannot  be  expressed  in  closeil  form.  I'hus,  we  use  only  bounds 
obtained  by  considering  half-planes, 

rile  receiver  employs  differentially  coherent  detection  and 
sc|uare-law  diversity  combining,  .as  in  [2].  I'he  diversity  channels 
are  mo<lele<l  .as  jointly  (iau.ssian  wide-.sense-slationary  uncorre¬ 
lated  scattering  fading  channels.  The  probability  that  the  de¬ 
cision  statistic  falls  within  a  specified  half-|)lane  is  e<iual  to  the 
probability  that  a  certain  llermitian  ipiadratic  form  in  jointly 
(iaussian  randotn  variables  is  less  than  zero.  'Phis  probability 
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is  evaluated  by  the  approach  of  [1].  Thus,  tight  bounds  on  the 
error  probability  are  obtained  for  M-ary  DPSK  and  a  general 
class  of  fading  diver.sity  channels. 

An  application  of  the  results  of  this  paper  is  illustrated  in  the 
figure,  where  upper  and  lower  bounds  on  the  bit  error  probability 
are  given  for  16-ary  DPSK  signalling  and  dual  diversity  combin¬ 
ing.  Each  diversity  channel  is  a  doubly  selective  fading  channel. 
The  delay  spectrum  of  each  channel  is  rectangular  with  a  nor¬ 
malized  rms  delay  spread  of  0.1  and  the  time-correlation  func¬ 
tion  is  exponential  with  a  normalized  Doppler  spread  of  0.0001. 
rile  signal-to-noi.se  ratio  is  the  ratio  of  the  mean  received  signal 
energy  to  the  noise  power  density. 
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Abstract 

We  consider  the  limiting  channel  cutoff  rate  for  memory¬ 
less  phase-only  modulation  operating  on  the  slow-fading  Rician 
channel  for  large  alphabet  size,  M.  Previous  work  has  con¬ 
sidered  evaluation  of  the  cutoff  rate,  Ro,  in  bits/channel  use, 
for  multiple  phase-shift  keyed  (MPSK)  modulation  with  fixed 
M  as  a  function  of  the  channel  parameters,  as  well  as  sys¬ 
tem  implementation  details  such  as  whether  or  not  interleav¬ 
ing/deinterleaving  or  channel  state  information  (CSI)  is  used. 
Here  we  evaluate  the  limiting  Ro  when  the  restriction  of  a 
fixed  alphabet  size,  M,  is  removed  but  the  transmitted  sig¬ 
nal  is  restricted  to  utilize  memoryless  phase-only  modulation. 
Results  are  provided  under  the  assumptions  of  ideal  interleav¬ 
ing/deinterleaving  both  with  perfect  and  no  CSI.  In  the  case  of 
no  CSI  we  show  that  there  exists  a  maximum  useful  signaling 
rate  which  depends  only  upon  the  ratio  of  specular-to-diffusc 
energy. 

Summary 

In  previous  work  [1]  we  have  evaluated  the  channel  cutoff 
rate  for  multiple  pheise-shift  keyed  modulation  (MPSK)  oper¬ 
ating  on  the  slow-fading  Rician  channel.  Here,  the  cutoff  rate, 
Ro,  in  bits/channel  use,  was  evaluated  for  fixed  alphabet  size, 
Af,  as  a  function  of  channel  parameters  as  well  as  system  im¬ 
plementation  details.  The  latter  include  whether  or  not  inter¬ 
leaving/  deinterleaving  or  channel  state  information  (CSI)  is  em¬ 
ployed.  It  is  of  some  interest  to  determine  the  limiting  cutoff 
rate  performance  under  the  same  conditions  when  the  restric¬ 
tion  of  fixed  M  is  removed  but  the  transmitted  signal  is  re¬ 
stricted  to  utilize  memoryless  phase-only  modulation. 

The  capacity  under  a  memory  less  ph2ise-only  constraint  is 
described  in  [2]  for  operation  on  the  additive  white  Gaussian 
noise  (AWGN)  channel.  No  work  to  the  author’s  knowledge  has 
considered  the  corresponding  cutoff  rate,  under  a  pheise-only 
constraint,  on  the  AWGN  channel  let  alone  a  fading  channel. 


^This  work  was  supported  in  part  i>y  D.4RPA  under  Contract 
No.  F30G02-92  C-0030. 


In  the  present  paper  we  provide  the  derivation  and  numerical 
evaluation  of  the  cutoff  rate  for  phase-only  modulation  on  the 
slow-fading  Rician  channel  under  the  assumption  of  ideal  inter¬ 
leaving/deinterleaving  and  for  the  two  extremes  of  perfect  and 
no  CSI.  The  channel  modeling  assumptions  include  the  AWGN 
as  a  special  case.  Indeed,  in  this  case  we  demonstrate  that 
for  large  EkjNo  the  cutoff  rate  is  approximately  1.68dB  from 
the  asymptotic  capacity  determined  in  [2].  Thus,  at  least  in  the 
AWGN  case,  conclusions  based  on  capacity  arguments  are  mim¬ 
icked  by  the  corresponding  cutoff  rate  results.  This  is  heartening 
since,  as  first  argued  by  Massey  [3], [4],  the  cutoff  rate  has  come 
to  be  accepted  as  the  practical  upper  limit  on  channel  signaling 
rates  for  which  arbitrarily  high  reliability  can  be  expected. 

The  more  general  results  for  an  arbitrary  slow-fading  Ri¬ 
cian  channel  are  likewise  useful  in  assessing  modulation/coding 
tradeoffs  on  representative  fading  channels.  For  example,  for 
the  case  of  no  CSI  we  show  that  there  exists  a  maximum  useful 
signaling  rate  which  depends  only  upon  the  ratio  of  speculeu"- 
to-diffuse  energy.  Thus,  regardless  of  the  alphabet  size,  M,  the 
channel  throughput  cannot  be  improved  by  increasing  EijNo  as 
is  the  case  within  perfect  CSI. 
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Abstract 

A  suboptimal  breadth-first  multiple-path  bidirectional  decoding 
algorithm  for  convolutional  codes  has  been  shown  to  provide  very 
attractive  error  performances  over  the  binary  symmetric  channel.  In 
this  paper,  new  computer  simulation  results  for  bidirectional  decod¬ 
ing  of  convolutional  codes  over  soft-decision  Rayleigh  fading 
channels  are  presented.  Using  a  memory  length  v  =  19  and  rate 

/?  =  2  code,  these  results  show  that  a  gain  near  5  dB  can  be 

_3 

achieved  for  a  low  frame  error  probability  (Py<  10  )  over  the 
Viterbi  algorithm  of  equivalent  decoding  complexity 

(V  =  6,  /?  =  2  )■  The  results  also  indicate  that,  depending  on  the 
length  of  the  frames,  a  significant  gain  can  also  be  achieved  for  low 
bit  error  probability  (Pj,  <  10”^). 

Summary 

The  Viterbi  algorithm  is  widely  used  for  the  decoding  of  convo¬ 
lutional  codes  [1-3],  This  optimal  algorithm  [4]  exhaustively 
searches  all  states  of  the  trellis  and  delivers  the  most  likely  informa¬ 
tion  sequence  given  the  received  symbols.  The  major  drawback  of 
this  technique  is  that  its  complexity  increases  exponentially  with 
the  memory  of  the  code,  making  its  use  restricted,  for  practical  rea¬ 
sons,  to  short  memory  codes  (v  <  7).  For  the  decoding  of  longer 
memory  codes,  suboptimal  decoding  procedure  must  be  consid¬ 
ered. 

The  M-Algorithm  [5]  and  other  breadth-first  decoding  algo¬ 
rithms  [6,71,  have  a  constant  computational  load  which  is  guided  by 
the  number  of  paths  explored,  M,  instead  of  the  number  of  states  in 
the  trellis.  A  major  drawback  of  these  decoding  techniques  is  the 
lack  of  resynchronization,  leading  to  long  error  events  when  the 
correct  path  is  lost. 

The  bidirectional  decoding  algorithm  [8]  has  been  shown  to  be 
very  effective  in  reducing  the  length  of  the  error  events  caused  by 
correct  path  lost.  This  suboptimal  algorithm,  suited  for  long  mem¬ 
ory  codes,  uses  a  fixed  number  of  paths,  M,  all  of  equal  length,  in  a 
bidirectional  breadth-first  tree  searching  manner.  By  a  judicious 
sharing  of  the  forward  and  reverse  explorations  of  the  tree,  this 
decoding  technique  restricts  the  extend  of  the  error  propagation  due 
to  correct  path  lost.  Bidirectional  decoding  does  not  introduce  any 
computational  variability  and  effectively  lowers  the  number  of 
computations  in  order  to  achieve  the  same  bit  error  rate  as  a  Viterbi 
decoder  of  equivalent  decoding  complexity,  that  is,  same  number  of 
path  extensions  at  each  decoding  step. 

In  this  paper,  bidirectional  decoding  of  a  v  =  19  and  rate 

/f  =  2  convolutional  code  with  64  path  extensions  at  each  decod¬ 
ing  step  (Af  =  64)  is  compared  using  computer  simulations  to 


1.  This  research  has  been  supported  in  part  by  the  Natural  Sci¬ 
ences  and  Engineering  Research  Council  of  Canada,  the  Fonds  pour 
la  formation  de  Chercheurs  and  I’ Aide  H  la  Recherche  of  Qudbec 
and  the  Canadian  Institute  for  Telecommunication  Research  under 
the  National  Centers  of  Excellence  program  of  the  Government  of 
Canada. 


Viterbi  decoding  of  a  v  =  6  and  rate  ^  j  Rayleigh 

fading  channels.  Two  fading  channels  have  been  considered.  First, 
the  urban  radio  mobile  channel  with  Rayleigh  fading  and  Bessel 
autocovariance  of  the  received  symbols’  energy  is  examined. 
Results  show  that  a  gain  of  3.7  dB  to  4.5  dB  is  achieved  by  the  bidi¬ 
rectional  algorithm  over  the  Viterbi  algorithm  on  a  frame  error 

probability  P^<  10’^  for  normalized  Doppler  frequencies  ranging 
from  F^T  =0.1  to  F^T  =  0.005.  These  last  results  have  been 
obtained  for  a  frame  length  of  L  =  500  information  bits,  but  exten¬ 
sive  simulation  results  indicate  that  the  gain  provided  by  bidirec¬ 
tional  decoding,  for  frame  error  performances,  is  only  slightly 
dependent  on  the  frame  length.  However  the  bit  error  performance 
gains  obtained  by  the  bidirectional  algorithm  over  the  Viterbi  algo¬ 
rithm  are  frame  length-dependent  since  the  bit  error  rate  of  the  bidi¬ 
rectional  algorithm  is  influenced  by  the  length  of  the  data  frame. 
For  frames  of  L  =  500  information  bits,  results  show  that  gains 

ranging  from  0.2  dB  to  2.3  dB  for  a  bit  error  probability  F*  <  10'^ 
can  be  obtained  with  bidirectional  decoding  over  the  Viterbi  algo¬ 
rithm  of  an  equivalent  computational  complexity. 

Computer  simulation  results  for  the  Rayleigh  fading  channel 
with  exponential  autocovariance  will  also  be  presented.  These 
results,  over  the  same  normalized  Doppler  frequency  range 
(F^T  =  0.1  to  FqT  =  0.005),  show  that  the  Viterbi  algorithm 

performs  better  in  this  channel  than  in  the  previous  one,  reducing 
somewhat  the  advantage  of  the  bidirectional  algorithm  over  the 
Viterbi  algorithm.  Nevertheless,  substantial  coding  gains  in  the 
frame  error  performances  (2  to  3  dB)  can  still  be  achieved  and,  fur¬ 
thermore  depending  on  the  length  of  the  frames,  an  improvement  in 
the  bit  error  performances  (about  1  dB)  can  also  be  obtained  with 
the  bidirectional  algorithm. 
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Abstract 

The  use  of  block  coding  and  errors- and- erasures  decoding  can  en¬ 
hance  performance  in  frequency-hop  communication  systems,  provided 
that  a  good  scheme  is  employed  to  determine  which  symbols  to  erase. 
The  problem  of  making  erasure  decisions  from  collections  of  receiver 
outputs  is  investigated  in  this  paper.  Methods  to  determine  which  re¬ 
ceived  symbols  to  ertise  are  derived  from  Bayesian  decision  theory.  The 
result  is  a  Bayesian  scheme  in  which  erasure  decisions  are  made  collec¬ 
tively  for  each  codeword.  The  performance  of  this  scheme  is  compared 
with  the  performance  of  another  Bayesian  method  in  which  erasure 
decisions  are  made  independently  from  symbol  to  symbol,  and  both 
are  compared  to  the  performance  of  receivers  that  do  not  erase.  The 
Bayesian  method  with  dependent  erasures  is  found  to  provide  the  best 
performance. 

Summary 

The  performance  of  frequency-hop  communication  systems  that 
arc  subjected  to  wideband  noise  and  frequency-selective  fading  is  gen¬ 
erally  unacceptable  without  some  form  of  error  correction.  Erasure 
of  the  least  reliable  symbols  prior  to  decoding  can  provide  significant 
improvement  in  performance  if  the  communications  receiver  has  some 
way  to  accurately  determine  the  reliability  of  the  received  symbols.  To 
accomplish  this,  the  receiver  must  generate  a  statistic  that  is  a  measure 
of  the  likelihood  that  a  symbol  is  in  error,  and  the  decoder  must  use 
this  statistic  effectively  to  decide  whether  to  erase  the  symbol.  Some 
approaches  require  the  transmission  of  additional  redundant  symbols 
in  order  to  obtain  side  information  for  determining  which  symbols  to 
erase  (see  [l]  and  the  references  in  [2]). 

An  alternative  approach  is  to  base  erasure  decisions  on  the  enve¬ 
lope  detector  outputs  of  a  noncoherent  receiver,  without  transmitting 
additional  symbols.  In  [2],  Bayesian  decision  theory  is  used  to  obtain 
a  erasure  scheme  that  minimizes  a  bound  on  the  probabibty  of  not 
decoding  correctly.  With  this  method,  one  first  computes  a  function  of 
the  envelope  detector  outputs  that  correspond  to  a  given  code  symbol 
(this  function  is  essentially  a  reliability  function).  The  result  is  then 
compared  to  a  threshold  to  decide  whether  to  erase  the  corresponding 
symbol. 

There  is  one  significant  drawback  to  this  Bayesian  technique.  Be¬ 
cause  the  erasure  decisions  are  made  independently  from  symbol  to 
symbol,  it  is  possible  that  more  symbols  can  be  erased  than  the  code 
is  capable  of  correcting.  One  intuitive  solution  to  this  problem  is  to 
employ  the  Bayesian  erasure  scheme  in  parallel  with  errors-oidy  decod¬ 
ing.  This  has  been  proposed  with  other  erasure  schemesfl,  3).  Unfor¬ 
tunately,  our  studies  show  that  negligible  performance  improvements 
result  for  the  Bayesian  scheme. 

In  this  paper,  we  propose  the  use  of  a  Bayesian  decision  rule  that 
makes  dependent  erasure  decisions.  The  form  of  this  rule  is  obtained 
by  using  a  decision-theoretic  approach  to  minimize  the  probability  of 
not  decoding  correctly  imder  a  model  that  distorts  the  prior  prob¬ 
abilities  to  make  all  symbol  sequences  equally  likely.  The  result  is 
a  decision  rule  that  offers  significant  performance  improvements  over 
the  Bayesian  scheme  with  independent  erastues  with  only  a  modest 
increase  in  complexity. 

The  system  under  consideration  is  similar  to  the  system  described 
in  [4].  Frequency-hop  spread-spectrum  transmission,  noncoherent  de¬ 
modulation,  and  an  (n,  k)  extended  Reed-Solomon  (RS)  code  are  em¬ 
ployed.  The  modulation  is  M-ary  orthogonal  signaling  with  M  =  n, 
and  n  is  a  power  of  two.  The  channel  includes  the  effects  of  Rayleigh 
fading  as  well  as  white  Gaussian  noise  with  spectral  density  jNo.  The 


receiver  determines  a  set  of  symbols  to  erase,  makes  hard  decisions 
on  the  other  symbols,  and  then  employs  errors-and-erasures  bounded- 
distance  decoding. 

We  let  Y-'  denote  a  vector  of  envelope  detector  outputs  that  corre¬ 
spond  to  the  j-th  code  symbol  in  a  codeword.  The  conditional  density 
of  Y'^,  given  that  the  j-th.  code  symbol  sent  was  Sj,  is  denoted  by 
/(yils,).  The  receiver  we  propose  can  be  described  as  follows: 

1.  If  the  yth  code  symbol  is  not  erased,  choose  the  s,  that  maximizes 

/(y^l*.)- 

2.  If  /  symbols  are  to  be  erased,  then  they  should  be  the  symbols 
with  the  t  smallest  values  of  L(y^),  1  <  j  <  n,  where 

L(y^)  =  mM/(y^|st)/  /(y^|si). 

*  1=0 

3.  For  each  i,  1  <  i  <  n,  let  L,-  be  the  ith  smallest  clement  in  the 
sequence  L(y’),  L(y*),  ...,  I'(y’').  Then  t  symbols  arc  erased  if 
and  only  if  I  minimizes 

P(  4-  A, +2  -t-  . . .  -H  A,  >  [(n  -  i  -  f)/2j ), 
where  A',-  is  a  Bernoulli  random  variable  with  parameter  L,-. 

The  receiver  described  above  is  quite  general,  and  can  be  applied  to  a 
variety  of  chaimel  and  signaling  models.  For  the  channel  and  system 
described  above, 

L(yi)  = 

E.^j'exp{(v;)Vl2<r»(l+<rVr’)]}’ 

where  ir  and  t  satisfy  T*/tr*  =  (logj  n)(i/n)(Ss/yVo).  In  this  expres¬ 
sion,  Eh  is  the  average  received  energy  per  data  bit,  and  yj  is  the  value 
of  the  envelope  detector  output  that  corresponds  to  <». 

The  performance  of  this  erasure  decision  scheme  is  measured  by  the 
probability  of  not  decoding  correctly.  For  performance  comparisons,, 
we  consider  the  Bayesian  method  in  [2]  as  well  as  simple  hard-decision 
demodulation  (no  erasures).  Our  simulation  results  show  that,  over  a 
wide  range  of  error  probabilities  and  with  (32,12)  and  (32,16)  RS  codes, 
the  method  in  (2)  provides  several  dB  of  performance  gain  over  errors- 
only  decoding,  and  the  dependent  erasure  scheme  provides  roughly  an 
additional  0.5  dB  of  gain. 
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The  system  studied  is  a  frequency-hopping  continuous- 
phase  frequency- shift  keying  (digital  FM)  communication 
system  (FH/CPFSK)  that  utilizes  error-control  coding,  inter¬ 
leaving,  and  L  hops/bit  diversity  to  mitigate  the  effects  of  noise 
and  jamming.  In  the  slow-hopping  transmission  scheme,  coded 
binary  data  symbols  are  repeated  on  L  different  hops  in  order 
to  increase  the  likelihood  that  some  of  the  symbols  are  free  of 
partial-band  jamming  interference.  The  coded  symbols  are  first 
read  into  a  Q-symbol  buffer,  where  Q  is  the  number  of  symbols 
that  can  be  transmitted  in  one  hop  period.  After  interleaving, 
L  copies  of  the  Q-symbol  sequence  are  transmitted  on  L  succes¬ 
sive  hops.  The  reception  scheme  uses  a  method  for  combining 
diversity  transmissions  whose  performance  is  to  be  evaluated. 

The  conditional  bit  error  performance  of  a  binary  FM  com¬ 
munications  system  employing  as  a  decision  variable  the  sum  of 
L  demodulator  output  samples  (i  =  1,  2,  ...,  L)  that  are  inde¬ 
pendent  and  identically  distributea  can  be  formulated  as 

Pe(L;  (1) 

where  P)  is  the  characteristic  function  (CHF)  of  Zj  under 

parametric  conditions  denoted  by  /9.  The  unconditional  error 
probability  is  found  by  averaging  with  respect  to  the  para¬ 
metric  conditions;  for  example,  can  represent  the  effects  of 
intersymbol  interference  (ISI),  with  the  averaging  being  taken 
over  different  adjacent-bit  patterns.  This  formulation  can  be 
applied  to  FH  digital  FM  systems  in  which  the  decision  var¬ 
iable  is  a  weighted  sum  of  diversity  transmission  samples  that 
are  subject  individually  md  independently  to  jamming  (as 
under  partial-band  noise  jamming  of  FH  signals)  by  writing 

7;  0)  =  {i-l)C.{wov;  Pn;  0)  +  yC,(iVii>;  pr  \  P).  (2) 

In  (2),  7  is  the  probability  that  a  hop  is  jammed  (fraction  of 
band  jammed);  ^o)  3^*^  (Pti  “'i)  respectively,  the 
combinations  of  SNRs  and  of  weights  multiplying  the  samples 
that  pertain  under  non-jamming  and  jamming  conditions;  and 
C,(t/;  p, ;  0)  denotes  the  characteristic  function  for  a  sample 
when  the  SNR  in  the  signal  intermediate-frequency  (IF)  band¬ 
width  equals  (at  =  N  for  noise-only,  and  x  =  T  for 

noise-plus-jamming),  the  bit  energy-to-noise  power  spectral 
density  ratio,  divided  by  the  number  of  hops  per  bit. 


For  a  differential  detector,  in  [1]  it  is  shown  that  the  CHF 
for  a  demodulator  output  sample  has  the  form 

1 _ 


P.  -,0)  = 


1  +  41''c,C2  -f  2jv(C2  —  c, 


xexp' 


'1  1  + 


—  C3d2)  —  2i^^CjC2(d,  +  dj 


_  (3a) 

-b4i/’ciCj-f  2ji/(cj-c,)  J 

where  in  terms  of  the  in-band  noise  variance  cr*  and  its  in-phase 
and  cross-phase  correlation  coefficients  r  and  A 

Cl, 2  =  -r^  T  A)  (3b) 

.  2p^{U'-rW'cosAoi±AnT?W'sinA«i} 

■  (S') 

In  (3),  the  ISI-dependent  parameters  are  ^=(U^  W',  A^), 
where  U'  and  W'  are  the  values  of  the  arithmetic  and  geometric 
means,  respectively,  of  the  SNRs  at  the  beginning  and  at  the 
end  of  the  bit  interval,  when  the  SNR  value  is  p*  =  1;  ±  A<^ 
are  the  possible  values  of  the  output  sample  (a  differential 
phase)  in  the  absence  of  noise. 


For  demodulation  using  a  limiter-discriminator,  the  CHF 
for  a  demodulator  output  sample  has  the  form  (1) 

C.(v;  p^;0)  =  e-“f' (4^) 

with  O'  denoting  the  average  number  of  FM  “clicks”  [2]  and 
C'^(r':  Pi  ;  0),  the  CHF  for  V’i  the  value  of  the  modulo- 25r 
differential  phase  of  the  signal  in  noise  at  the  end  of  the  bit 
interval.  An  estimate  of  a  is  [1] 


Q  =  (A«i/2T)e“^^'  (4b) 

The  probability  density  function  (PDF)  for  tp  exists  in  integral 
form  [2],  and  a  general  expression  for  its  CHF  does  not  exist  in 
closed  form.  However,  an  excellent  approximation  for  the  CHF 
has  been  found  to  be  [1] 


Px  ,  0)^  «(/>)  •  +  [1  -  </»)]  •  e 


(4c) 


with  a\  and  the  mixture  parameter  «  approximated  heur- 
isticallyby  _2p  ,  3  al-i^h(p)li 

e(p)  =  e  ''  and  erg  =  — i_  ^Q\ —  •  (4d) 

The  formula  for  (7%  is  based  on  equating  the  actual  phase  noise 
variance,  uj,,  with  that  of  the  approximate  PDF  corresponding 
to  (4c).  For  high  SNR,  the  differential  phase  PDF  approaches 
that  for  a  Gaussian  distribution,  while  for  low  SNR  it  ap¬ 
proaches  that  for  a  uniform  distribution. 


A  form  of  “adaptive  gain  control”  (AGC)  combining  has 
been  proposed,  under  which  it  is  assumed  that  the  SNR  p,  can 
be  measured  on  each  hop  and  that  the  L  detector  output  sam¬ 
ples  are  weighted  in  proportion  to  the  value  of  Pi  associated 
with  each  sample.  Referenced  to  the  absence  of  jamming,  for 
partial-band  jamming  the  two  weights  referred  to  in  (2)  then 
can  be  expressed  as 


u)o  =  1  and  u),  =  PjIpn  •  (5) 

This  weighting  scheme  h^is  the  effect  of  improving  the  quality 
of  the  decision  variable  by  de-emphasizing  the  samples  on 
jammed  hop  tremsmissions,  unless  ail  hops  were  jammed.  In 
the  oral  presentation,  exeimple  comparisons  will  be  shown  of  the 
uncoded  bit  error  probability  (vs  EJN j)  that  results  from  using 
an  adaptive  gzdn  control  combining  scheme  for  differential  de¬ 
tection  and  for  limiter-discriminator  detection  under  the  as¬ 
sumption  of  selected  values  of  L,  EJlfg,  the  FM  modulation 
index  h,  and  IF  time-bandwidth  product  W/^rT;  and  for  worst- 
case  partial-band  noise  jamming,  in  which  7  is  chosen  to  max¬ 
imize  the  error  probability.  The  effective  jammer  spectral 
density  level  is  assumed  to  be  its  average  over  the 

hopping  band  divided  by  7,  so  that  there  is  a  tradeoff  between 
how  much  of  the  band  is  jammed  and  the  strength  of  the 
jamming  in  the  jammed  portion  of  the  band.  Also,  as  L 
increases,  the  energy  per  hop  is  reduced  but  the  chances  of 
having  an  unjammed  hop  increases;  the  tradeoff  involved  is 
that  the  noncoherent  combining  of  the  detector  output  samples 
cannot  recover  the  total  energy  effectively,  that  is, 
Pe(^)  —  Pe(^)  without  jamming. 

The  use  of  AGC  diversity  combining  results  in  significant 
performance  improvement.  In  one  example,  a  10“^  probability 
of  bit  error  can  be  achieved  for  about  24  dB  more  jammer  pow¬ 
er  when  L  is  increased  from  1  to  2.  Generally,  the  amount  of 
“diversity  gmn”  depends  on  both  the  type  of  demodulator  and 
on  E^/J(fo-  Comparisons  of  demodulator  types  based  on  having 
their  E^jS^  values  sufficient  to  produce  the  same  probability  of 
error  for  £  =  1  and  no  jamming  reveal  that  a  system  using  the 
differential  detector  with  diversity  will  outperform  one  using 
the  limiter-discriminator;  the  reason  for  this  effect  is  that  the 
limiter-discriminator  incurs  more  noncoherent  combining  losses 
than  does  the  differential  detector.  In  the  oral  presentation  of 
this  paper,  additional  results  will  be  shown,  including  com¬ 
parisons  of  the  adaptive  gain  control  scheme  with  a  hard- 
decision  combining  scheme  in  which  a  majority  vote  is  taken 
among  bit-value  decisions  made  on  each  hop. 

(1]  1.  S.  Lee  Associates,  Inc.,  “Studies  of  ECCM  Improvements  for 
Frequency-Hopping  CPFSK  Systems,"  report  to  US  Army  Research 
Office  under  contract  DAAL03-89-C-0010,  May  1990.  (DTIC 
accession  number  AD-A222  995.) 

[2)  R.  F.  Pawula,  S.  O.  Rice,  and  J.  H.  Roberts,  “Distribution  of  the 
Phase  Angle  Between  Two  Vectors  Perturbed  by  Gaussian  Noise," 
IEEE  Trans,  on  Comman.,  vol.  COM-30,  pp.  1828-1841  (Aug.  1982). 
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Abstract 


In  this  paper  we  address  the  parallel  decoding  problem  in 
a  general  formulation.  The  structure  of  the  receiver  consists 
of  a  bank  of  z  demodulators  each  followed  by  an  errors  and 
erasures  correcting  decoders.  Each  demodulator  has  a  threshold 
0  that  determines  an  erasure  region;  we  then  assign  a  cost  f{0) 
tojhe  interference  for  causing  an  erasure  and  a  (larger)  cost 
for  causing  an  error.  The  goal  in  designing  the  receiver 
is  to  choose  the  thresholds  to  maximize  the  interference  cost 
necessary  to  cause  a  decoding  error.  We  demonstrate  that  the 
above  formulation  is  solvable  for  many  channels  of  interest. 

Problem  Statement 

Parallel  decoding  is  of  considerable  interest  to  improve  on 
the  performance  of  coded  communications  systems  limited  by 
various  types  of  interference.  Early  work  on  parallel  decoding 
dates  back  to  Forney  in  his  work  on  generalized  minimum  dis¬ 
tance  decoding  [1].  Parallel  decoding  has  been  used  since  then 
for  decoding  concatenated  codes  [2,  3,  4,  5].  In  parallel  decod¬ 
ing  the  channel  output  is  processed  by  2  branches;  each  liranch 
consists  of  a  demodulator  connected  to  a  decoder.  The  t-th  de¬ 
modulator  is  characterized  by  a  threshold  6,  for  deciding  whether 
to  erase  or  to  output  its  best  estimate  to  the  decoder;  the  in¬ 
put  to  the  decoder  is  then  an  erasure,  a  correct  estimate,  or  an 
erroneous  symbol.  Then  z  identical  bounded-distance  decoders 
(one  for  each  branch)  are  used  to  correct  the  meocimum  number 
of  errors  and  erasures.  The  receiver,  therefore,  produces  r  can¬ 
didate  estimates  of  the  transmitted  codeword  in  which  the  most 
likely  codeword  is  selected.  The  interference  distorts  the  signal 
and  there  is  a  cost  associated  with  each  type  of  distortion.  The 
cost  of  causing  an  erasure  in  branch  t  is  f(0i)-  The  larger  cost 
f(6i)  is  incurred  for  causing  an  error  to  the  nearest  code  sym¬ 
bol.  The  above  communication  system  can  be  characterized  by 
the  following  a  game  with  two  players:  the  communicator  and  a 
jammer. 

Communicator’s  Game  -  Choose  the  thresholds  to 

maximize  the  minimum  cost  necessary  for  a  jammer  to  cause 
the  overall  decoding  system  to  err  (not  decode  to  the  correct 
codeword). 

Jammer’s  Game  -  Chose  the  distortion  to  minimize  the  cost 
needed  to  force  the  communicator  to  cause  an  error  no  matter 
what  thresholds  are  used. 

Solution  and  Examples 

The  solution  to  the  game  above  can  be  proven  to  satisfy  the 
following  set  of  equations: 

!(h)  ^  f(0k-x)  =  a  A:=  1,2,...,2-H 

with  the  following  boundary  conditions 

/(<».+.)  =  m), 

=  im  +  noo) 


ing).  The  above  formulation  is  valid  for  parallel  demodulation 
or  decoding  of  concatenated  codes  in  which  the  inner  decoder 
of  branch  i  is  characterized  by  a  threshold  6i.  Also,  the  above 
formulation  is  solvable  for  many  channels  including  the  simple 
M-aiy  input-output  channel  with  the  Hamming  distance  as  the 
cost  function  and  the  additive  channel  where  the  cost  function 
corresponds  to  Euclidean  distance.  The  next  two  examples  illus¬ 
trates  the  applicability  to  noncoherent  channel  with  ratio  thresh¬ 
old  like  decision  rules. 

Noncoherent  Case-  Ratio  Threshold 

Consider  the  transmission  of  M-axy  code  symbols  over  a  con¬ 
tinuous  additive  white  Gaussian  channel,  using  orthogonal  Fre¬ 
quency  Shift  Keying  (FSK).  The  received  signal  is  noncoher- 
ently  demodulated  with  the  resulting  M  matched  filters  energy 
outputs  each  corresponding  to  a  transmitted  Al¬ 

ary  symbol.  In  conventional  receivers  the  transmitted  symbol 
is  chosen  that  corresponds  to  the  largest  energy  value.  With 
no  loss  of  generality,  assume  that  symbol  0  is  transmitted  and 
Vi  =  max{y'i, ...,  Ym-i}-  The  decision  rule  for  a  decoder  with 
Viterbi  ratio  threshold  characterized  by  6  is: 

Choose  0  if  ^ 

Ercise  if  tan  6  <  <  cot  6\ 

Error  if  >  cot0. 

It  can  be  shown  (see  [6])  that  for  worst  case  interference  the 
cost  function  is 

/(Bi)  =  s\n^9i  e[0, 

m)  =  cos^9.. 

For  arbitrary  number  of  branches  the  optimal  9's  and  the 
error  correcting  capability  of  the  decoding  algorithm  0  are,  re¬ 
spectively. 


It  can  be  shown  [6]  that  the  error  correcting  capability,Q,  for 
difference  thresholding  type  of  decoder  is  larger  than  that  for 
ratio  thresholding  by  a  factor  \/2.  This  corresponds  to  a  gain  of 
1.5  dB  in  signal-to- noise  ratio. 

|1|  G.D.  Forney,  Concutenated  Codes^  MIT  reeenrch  moDO- 
grapb  No.  37,  The  MIT  press,  Cambrid^,  Mass. 

(2]  I.I.  Dumer,  V.A.  Zinovev,  ud  V.V.  Zyablov,  M^ascaded  de¬ 
coding  with  respect  to  minimal  generalized  distance,"  Prob^ 
iems  of  Controi  and  Information  TTieory,  V<d.  10,  No  1 
1982.  pp,  M7. 

(3|  A.  A.  Hassao  and  W.  E.  Stark,  “On  decoding  concatenated 
codes,"  IEEE  7Vvn«.  Inform.  Theory^  vol.  36,  no,  3,  pp. 

677-683,  May  1990- 

(4}  S.I.  Kovalev,  “Two  classes  of  minimum  generalized  distance 

decoding  algorithm,"  Probl.  Pertdachi  Inf,  vol.  22.  No.  .3  1986. 

[5]  V.  V.  Zyablov,  “Optimization  of  conci^enated  decoding  al¬ 
gorithms."  Prvh.  Peredacht  Inf,  vol  9.  no.  I,  pp  26  32  1973. 

[61  A.  A.  Hassan  and  W.  E.  Stark,  “Parallel  decoding  for  chan¬ 
nels  with  jamming,"  Proceedings  of  tht  IEEE  Conftrtnct  on 
MUitary  Communicoitons.  October  1992. 


and  with  f{0o)  =  0  if  the  demodulation  is  continuous  and  /(^o)  = 
1  if  the  demodulation  is  discrete  (e.g.  Hamming  distance  decod¬ 
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Abstract:  A  new  analysis  shows  that,  when  we  apply  the 
Lempel-Ziv  incremental  parsing  algorithm  to  i.i.d.  source  with 
probabilities:pi,i  =  the  expected  length  E|Wt|  of 

the  t-th  parsed  segment  Wi  is  given  by  a  simple  formula.  Fol¬ 
lowing  this  approach  we  can  show  a  VF(Variable  to  Fixed 
length)  version  of  Ziv-Lempel  universal  coding  theorem. 


Remark:  This  shows  sufficiently  that  limig_oc  R(ta)  —  H. 

These  theorems  are  based  on  the  following  two  lemmas. 
Let  us  consider  Wt^.]  insterul  of  Wt,  since  the  former  gives 
more  elegant  discussion. 

Lemma  1 


Let  A  =  {oi, . . . , Om}  be  an  alphabet.  Denote  by  A*  := 
the  set  of  all  strings  and  by  A  €  A*  a  null  string. 
The  Lempel-Ziv  parsing  LZ  :  — »  (A*)*’  is  defined([l]) 

for  an  instance  LZ{x)  =  (wi,W],...,'Wi,...)  such  that  x  = 
W]W2  ...  and  for  each  t  >  0  recursively  W141  is  determined 
to  be  the  shortest  string  not  in  the  set  T/  :=  {A,  Wi , . . . ,  Wi}. 
For  a  given  T/,  let  dT/  denote  the  set  of  possible  outcomes 
of  W141.  (Since  T/  can  be  regarded  as  a  set  of  inner  nodes  of 
a  tree,  dT/  is  a  set  of  corresponding  leaves.)  Since  |dT/_i|  = 
(m  —  l)t  -b  1,  at  most  (log,  ~  1)*  +  ^11  infomation 

bits  are  sufficient  to  represent  WiW]  . . .  Wig.  Now,  consider  an 
i.i.d.  process  taking  values  on  A  with  probability  parameters 
By  renewal  theory  we  may  define  a  rate  /i(to)  of 
the  Variable  to  Fixed(VF)  source  code,  which  consists  of  all 
possible  concatenations  of  first  to  parsing  segments,  by 


R(to) 


Ell. 


(1) 


where  |Wj|  denotes  the  length  of  the  string  W|.  The  term 
E|W||  is  given  exactly  and  asymptotically  respectively  by  the 
following  two  theorems. 


Theorem  1  (Main  theorem) 


£|W.|  =  (  n  )  (2) 

■=:  /=a  i=i 


where  the  null  product  is  1. 


Remark:  In  [2],  the  sum  5(1)  15|Wil  is  considered 

directly  and  the  following  recursion  is  obtained  by  a  different 
approach  from  ours; 


£|W.4i|=  p(y)Pr{y6r/},  (4) 

yea* 

where  p(yf )  p(yi . . .  yt)  =  ni=i  P(»«)>  expectation 

M  token  over  oil  random  generations  of  the  parsing  tree  7/ 
after  the  t-th  parsing. 

The  key  term  Pr{y  6  T/}  in  above  has  an  alternative 
expression  given  as  follows. 

Lemma  2  For  any  fixed  yf“  €  A°“ ,  /el  A^i ,  JVj , . . .  be  indepen¬ 
dent  (not  identically  distributed)  geometric  random  variables, 
distributed  as 

Pr{Nk  =  n)  =  (1  -  p(yf  ))-'p(yf )  for  all  n>l.  (5) 

Then 

k 

Pr{yf  er/}  =  Pr{53/V,<l}.  (6) 

/=l 

To  have  Theorem  2,  we  have  evaluated  the  following  in¬ 
equalities  for  a  summand  in  Lemma  1.  These  may  be  interest¬ 
ing  by  themselves. 

Lemma  3  (An  upper  bound) 

Y,  p<y)^’-<y  e  T/}  <  £[i  -  (1  -  p(y/ ))■)).  (7) 

yea* 

Lemma  4  (A  lower  bound)  For  any  real  J  >0, 

Y,  p(y)P’-{y  6  r/} 

yea* 

>  {l-e-^)^Pr{kJp-\Y*)<t-k}.  (8) 

Refe'ences 


5(1)  =  *  +  Y,tL{k  )p'‘(l  -  -  1). 

i=l  tsO 


with  5(-l)  =  0. 


Theorem  2  (VF  coding  theorem)  For  an  i.i.d.  source 
with  entropy  H, 


lim 

i—oOO 


£|W,| 

logl 


H~'. 


(3) 


[1]  J.Ziv  and  A.Lempel,  “Compression  of  Individual  Se¬ 
quences  via  Variable-rate  Coding,”  IEEE  TVons.  on  Infor¬ 
mation  Theory,  vol.IT-24,  no.5,  pp. 530-536,  Sept.  1978. 

(2|  Y.M.Shtarkov  and  T.J.  Tjarkens,  “The  redundancy  of  the 
Ziv-Lempel  Algorithm  for  Memoryless  Sources,”  The  Pre¬ 
liminary  Manuscript  for  the  International  Workshop  for 
Information  Theory,  at  Einthoven,  Aug.  1990. 

13|  T.Kawabata,  “Exact  Analysis  of  the  Lempel-Ziv  Parsmg 
Algorithm  for  I.I.D.  Source,”  IEEE  TVuns.  on  Information 
Theory,  to  ^pear. 
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Abstract 

Ziv  and  Lempel  proposed  two  important  universal  coding  algoritiuns  in 
1977  and  1978[1,  2].  While  the  second  algorithm  called  LZ78  ha-s  been 
sufficiently  analyzed  in  the  literature,  the  first  LZ77  has  not  yet.  LZ77 
parses  input  data  into  a  sequence  of  phrases,  each  of  which  is  the  longest 
match  in  a  fixed-sized  sliding  window  which  consists  of  the  previously 
encoded  M  symbols.  Each  plirase  is  replaced  by  a  pointer  to  denote 
the  longest  match  in  the  window.  Then  a  window  slides  to  just  before 
the  next  synbol  to  be  encoded,  and  so  on.  In  this  paper,  we  modify  the 
algorithm  of  LZ77  to  restrict  pointers  to  starting  only  at  the  boundary 
Of  a  previously  parsed  phra.se  in  a  window.  Although  the  number  of 
parsed  phrase  should  increase  more  than  those  in  LZ77,  the  amount  of 
bits  needed  to  encode  pointers  is  considerably  reduced  since  the  number 
of  possible  positions  to  be  encoded  is  much  smaller.  Then  we  show 
that  for  any  stationary  finite  state  source,  the  modified  LZ77  code  is 
asymptotically  optimal  with  the  convergence  rate  Odoglog  M/log  A/) 
where  M  is  the  size  of  a  sliding  window. 

Definition  of  Stationary  Finite  State  Sources 

Let  A'"  =  X\.X-2 . Xn  be  a  finite  output  sequence  of  an  in¬ 

formation  source  where  A,  is  a  remdom  variable  which  takes  val¬ 
ues  in  the  finite  set  A  with  cardinality  |A|  =  ct«  -x).  Also,  let 

Sg  =  So.Si.S'2 . S„  be  a  sequence  of  states  of  the  source  where 

S,  is  also  a  random  variable  which  takes  values  in  the  finite  set  S  with 
cardinality  |<S|  =  /J  (<  x)  where  So  is  called  the  initial  state.  We  also 
use  a  bold  italic  letter  to  represent  a  sequence  or  string  of  symbols  such 
as  A  or  S  if  its  length  is  given  in  the  context.  The  structure  of  the 
source  is  described  by  giving  the  statistical  dependence  of  A;  on  states 
of  the  source:  A  source  is  said  to  be  finite-state  if  the  joint  probability 
of  A  =  A”  and  5  =  Sg  is  given  by 

n 

Pt{X  =  X.5  =  5)  = 

j=i 

for  any  x  =  riij  ■  •  6  A"  and  s  =  so  sj  •  •  s„  €  5"’*''  where 

p(a,s|<).a  €  A.  s.t  e  <S  are  conditional  join  probability  ma.ss  functions 
and  q(  )  isn  probability  mass  function.  Since  the  state  sequence  S  is  not 
apparent,  only  the  output  sequence  A  is  observable.  The  probability 
of  A  is  determined  by 

Pr(A  =  i)  =  Y'  Pr(A  =  i.5=s), 

*65'+' 

Both  probabilities  Pr(A  =  x.S  =  s)  and  Pr(  A  =  x)  are  also  denoted 
by  P{x.s)  and  P{x).  respectively.  Moreover.  Vg  denote  the  cletss  of 
all  stationary  finite-alphab(-t  source  with  A  and  S  where  |.4[  =  a  ami 

|5|  =  /?■ 

A  coding  scheme  considered  here  is  mostly  the  same  as  the  LZ77 
scheme,  except  making  pointers  denote  only  boundaries  of  previously 
parsed  phrases  ivs  follows: 

A  Parsing  Algorithm  PARR 

Repeat  the  following  steps  until  the  input  is  exhausted: 

1.  Find  the  longest  match  of  the  current  input  sequence  from  the 
previously  parsed  sequence  within  the  window  of  length  M,  The 
start  position  of  the  match  must  be  that  of  the  head  of  a  previously 
parsed  phra.se.  And  the  longest  match  is  allowerl  to  extent  beyond 
the  window  as  long  as  it  matches  the  input  data. 


2.  The  longest  matched  sequence  is  parsed  into  a  phrase  W. 

3.  Without  finding  any  matched  .sequence,  the  next  input  symbol  is 

parsed  into  a  phrase  of  length  1 .  Q 

The  above  algorithm  to  parse  the  input  sequence  will  be  denoted  as 
PARR  (Parsing  Algorithm  based  on  Restricted  Reproducibility). 

Coding  Methods 

The  encoder  converts  each  phrase  into  a  binary  codeword  under  one 
of  two  separated  modes.  Which  mode  is  taken  depends  on  the  status 
of  the  algorithm  when  parsing  a  phrase.  One  mode  is  corresponding  to 
the  case  no  match  is  found.  The  other  is  to  the  ca.se  the  longest  match 
is  found.  The  fonner  case  is  denoted  as  a  direct  mode,  and  the  latter 
case  as  an  indirect  mode.  In  a  direct  mode,  the  encoder  .sends  out  one 
bit  ‘zero'  followed  by  a  symbol  which  has  been  parsed  into  a  phrase 
length  of  1  since  there  is  no  match  for  it.  On  the  other  hand,  in  an 
indirect  mode,  it  sends  out  a  codeword  consisting  of  three  parts:  one 
bit  "one',  the  position  p  of  the  longest  match  W.  and  the  length  /  of  W. 
To  encode  a  symbol,  a  matched  position,  and  length,  it  is  sufficient  to 
assign  flogo]  bits  for  .symbol  n.  RoS  bits  for  p,  and  2|,log(l  -*•  '.)] 
for  /. 

The  decoding  process  is  quite  simple.  The  first  step  of  the  decoder 
reads  in  a  single  bit.  If  tliis  is  a  zero,  the  next  flog  o]  bits  will  contains 
a  symbol.  If  the  input  bit  was  a  one.  it  reads  in  a  matched  position  and 
length  instead  of  a  .symbol.  Then  the  decoder  can  reproduce  a  phrase 
from  the  contents  of  the  current  window.  Repeat  the  above  procedure 
until  the  rode  sequence  is  exhausted. 

Results 

Let  us  suppose  that  x  is  parsed  into  n  phrases.  W].W-2 . Wi, 

through  PARR  and  encoded  according  to  the  method  described  above. 
Then  the  total  length  of  a  codeword  p.\/(x)  is  given  by 

Pm{x)  =  1'+  nog«1+  X!  +2[log(lj 1)J} 

jeJa  j€Ji 

where  vij  is  the  numlrcr  of  the  previously  parsed  phrases  to  be  referred 
when  parsing  Wj  into  a  phra.se.  and  Ij  is  the  length  of  Wj.  Moreover, 
Jo  is  the  mmiber  of  phrases  which  belong  to  the  direct  mode  and  J/  is 
that  of  those  which  belong  to  the  indirect  mode,  that  is.  e  =  lJo|-l-|J;|. 
Then,  we  have  obtained  tlie  following  main  result. 

Theorem  Let  { be  a  stationary  finite-state  stochastic  process 
with  the  state  set  5-  Let  p,i/(x)  be  the  codeword  length  associated 

with  A  =  A).  Ai . A„  where  M  is  the  length  of  a  sliding  window 

associated  with  PARR  .  Then,  for  any  M  >  0,  if  ri  is  sufficiently  large, 
then 

-pu(A)  <  --log  max  R(  A) -KJ(loglog  log  Af ) 

n  71  s 

where  P,s  denotes  the  class  of  all  stationary  finite-state  sources  with 
the  finite  alphabet  A  and  the  finite  state  S. 
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The  main  purpose  of  data  compression  is  to  represent  source 
data  with  a  compact  form  by  applying  coding  techniques,  ft  can 
use  fewer  amount  of  data  to  substitute  original  large  volume  of  in¬ 
formation.  Compression  techniques  can  be  applied  to  data  storage 
and  data  transmission  applications.  It  can  save  the  space  needed 
when  storing  enormous  data  and  reduce  the  time  used  when  trans¬ 
mitting  data  via  communication  channels.  Data  compression  plays 
a  very  important  role  in  modern  information  systems.  By  its  re¬ 
sult,  data  compression  can  be  clrussified  as  two  categories;  lossless 
and  lossy  compre.ssion.  Lossless  compression  assures  the  original 
data  can  be  exactly  recovered  without  any  distortion.  We  will  fo¬ 
cus  on  lossless  case  in  the  following  discussions.  Common  lossless 
techniques  include  run-length  coding,  Huffman  coding,  arithmetic 
coding,  Lempel-Ziv  coding,  and  BSTW  coding.  Two  major  fun¬ 
damental  models  arc  probabilistic  model  and  dictionary  mode). 

One  obvious  redundancy  of  many  data  sets  is  the  repeated 
occurrence  of  substrings  or  patterns.  Techniques  that  factorize 
common  substrings  are  l<nown  as  dictionary  techniques.  A  dictio¬ 
nary  of  common  substrings  could  be  constructed  using  dictionary 
techniques  cither  on  the  fly  or  in  a  .separate  pass.  It  may  u.se 
the  same  dictionary  for  all  input  data  sets  (static)  or  construct  a 
different  dictionary  for  each  data  file  (adaptive  or  semi-adaptive). 
Lempel-Ziv  coding  ran  be  classified  as  one  of  the  adaptive  dictio¬ 
nary  techniques. 

Cache  memories  arc  high-speed  buffers  which  arc  inserted  be¬ 
tween  the  processors  and  main  memory  to  rapture  those  portions 
of  the  contents  of  main  memory  which  are  currently  in  use.  Some 
well-known  management  policies  include:  block  placement,  block 
identification,  block  replacement,  and  write  strategy.  The  idea  of 
fast  access  in  cache  can  be  applied  to  data  compression.  If  we  col¬ 
lect  frequently  occurring  siibstrings  (patterns)  in  a  small  cache-like 
dictionary  and  encode  these  patterns  with  fewer  bits,  the  overall 
compression  performance  should  be  better. 

For  dictionary  techniques,  policies  that  maintain  the  contents 
of  dictionaries  can  be  adopted  from  those  of  cache  management. 

■This  work  was  siipported  by  National  Science  Council,  Taipei.  Taiwan, 
Republic  of  China,  under  the  contract  No.  NS(,l-0408-lv002-232 


We  proposed  a  new  adaptive  multi-dictionary  model  to  describe 
the  behavior  of  compression  coding  by  the  management  policies 
of  dictionary.  Parameters  defined  in  the  model  include;  the  num¬ 
ber  of  dictionaries,  the  sizes  of  dictionaries,  the  generate  policy  to 
define  new  words  during  encoding,  the  codeword  representation 
mapping  that  specifies  the  output  bit  pattern  of  each  dictionary 
entry,  the  flagbit  representation  mapping  that  specifies  the  flag 
bit  pattern  to  point  out  the  current  used  dictionary,  the  place¬ 
ment  policy  to  decide  where  a  dictionary  word  should  be  placed, 
the  replacement  policy  to  throw  away  old  entries  when  dictionary 
fills,  the  update  policy  to  control  the  exchange  of  words  among  dic¬ 
tionaries,  and  the  adjustment  policy  to  mooii'y  codeword  mapping 
after  each  coding  step. 

Under  the  proposed  model,  the  coding  process  of  dictionary- 
based  coding  can  be  viewed  as  the  construction,  insertion,  dele¬ 
tion,  and  modification  of  dictionary  contents.  The  characteris¬ 
tics  of  Lempel-Ziv  type  methods  such  as  LZ77.  LZ78,  and  LZW 
can  be  exactly  described  by  the  specified  management  policies. 
Meanwhile,  some  other  non-dictionary  techniqww  can  also  be  in¬ 
cluded  in  our  model.  By  relating  the  coding  procedures  with  the 
dictionary  management  actions,  we  had  succe.ssfully  interpreted 
Huffman  coding  and  arithmetic  coding  as  special  cases  under  the 
proposed  model. 

The  model  describes  the  operational  behavior  of  dictionary- 
based  coding  by  nine  parameters.  Compression  efficiency  is  af¬ 
fected  greatly  by  those  factors.  The  features  of  our  proposed 
model  include  multiple  dictionaries,  time-varient  codeword  map¬ 
ping  mechanism,  adaptive  vocabulary  exchange  capability  between 

dictionaries,  and  the  placement,  replacement,  update  policies  for 
dictionary  vocabulary. 

Possible  applications  of  the  proposed  coding  model  are;  First, 
it  provides  ai>  unified  framework  to  interpret  existent  techniques. 
Second,  it  points  out  the  possible  directions  to  improve  current 
techniques.  Third,  new  coding  system  can  be  easily  developed  by 
choosing  suitable  management  policies.  The  influences  of  different 
parameters  on  compression  arc  the  future  research  topics. 
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The  expected  per-letter  redundancy 

fin(C„),p)  =  ^Di(xr)  +logp(l?))/i(x7) 

me.-'  'iires  how  far  the  n-block  -to-variable  length  binary 
P'.'i' code  C„,  with  length  function  L„,  is  from  being  op- 
'  inal  a  given  source  distribution  p.  If  a  sequence  of 
block- 1"  variable  length  prefix  codes  is  to  be  used  on  a 
class  S  ')f  sources  then  a  standard  requirement  is  univer- 
snlity,  namely  that  Rn{Cn,li)  0  for  each  member  p  of 
the  class,  as  n  goes  to  infinity.  The  existence  of  universal 
codes  for  the  class  £  of  all  ergodic  sources  with  a  fixed  al¬ 
phabet  is  well  known;  for  example,  the  Ziv-Lempel  code 
as  well  as  other  codes  are  known  to  be  universal  for  the 
class  of  all  ergodic  sources.  A  stronger  requirement  is  to 
ask  that  R„{Cn,n)  go  to  0  at  some  universal  rate  for  each 
member  of  the  class.  Sequences  of  prefix  codes  which  have 
the  property  that  the  expected  redundancy  per  symbol  is 
0((logn)/n)  have  been  constructed  for  various  classes  of 
sources,  such  as  the  class  of  memorlyless  sources,  the  class 
of  Markov  sources  of  a  given  order,  and,  more  recently,  the 
class  of  finite-state  sources  with  a  given  number  of  states. 
Furthermore,  such  results  can  easily  be  extended  to  count¬ 
able  unions  of  such  “nice”  classes.  Thus,  for  example,  there 
is  a  sequence  such  that  il„(C„,p)  =  0((logn)/n)  for  each 
p  which  is  Markov  of  some  order,  or  for  each  finite-state 
process  p. 

The  purpose  of  this  paper  is  to  show  that  rates  of  con¬ 
vergence  for  redundancy  are  possible  only  for  special  classes 
of  sources,  that  is,  there  is  no  universal  redundancy  rate 
for  any  sequence  of  prefix  codes  on  the  class  £  of  all  ergodic 
sources.  The  following  is  a  precise  statement  of  this  result. 


Theorem  1  For  each  n  let  C„  be  a  prefix  n-code  and  sup¬ 
pose  lim„  p{n)  =  0.  There  is  an  ergodic  source  p*  and  a 
subsequence  rtm  such  that 


ll^m—oo 


p{nm) 


00,  a.s. 


to  the  author  by  Shtarkov.  Iteration  of  the  method  leads 
to  a  sequence  of  periodic  measures  p"“  such  that 


lim 

m^oo 


/>("-) 


=  oo, 


and  such  that  only  a  few  substitutions  and  deletions  are 
needed  in  a  p""*  periodic  sample  path  to  produce  a  p"'”*' 
periodic  sample  path.  The  latter  will  guarantee  the  exis¬ 
tence  of  a  limit  measure  p  for  which  Theorem  1  holds. 

Acknowledgements.  The  author  was  partially  sup¬ 
ported  by  NSF  grants  DMS8742630  and  DMS-9024240. 


The  starting  point  for  the  construction  of  a  counterex¬ 
ample  p  is  a  simple  method  for  selecting  a  periodic  measure 
p"  such  that  /Z„(Cn),p")  >  1  —  o(l),  a  method  suggested 
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Abstract 

The  LZW(Lem pel- Ziv- Welch)  data  compiessioo  method  is  the  most 
popular  universal  coding  algorithm  and  used  in  several  practical  sys¬ 
tems.  The  LZW  method,  however,  has  following  two  disadvantages: 
the  compression  ratio  converges  too  slowly  and  the  compressibility  is 
poor  when  the  entropy  of  the  information  source  is  very  high.  In  or¬ 
der  to  alleviate  these  disadvantages,  we  propose  a  novel  source  coding 
technique  based  on  the  LZW  Algorithm  and  a  splay  tree.  Our  pro¬ 
posed  method  is  superior  to  the  LZW  method  in  terms  of  universality 
and  convergency.  Specially,  it  is  very  effective  to  compress  the  high 
entropy  information  source. 

Summary 

In  the  LZW  algorithmfl],  at  the  beginning  of  encoding  (Case  1) 
or  in  the  case  that  the  entropy  of  an  information  source  is  very  high 
(Case  2),  parsed  strings  in  a  string  table  are  not  used  efficiently  and 
symbols  are  encoded  frequently.  In  these  cases,  it  is  not  effective  to 
map  a  source  alphabet  A  =  {ai,02, . . .  ,aa}  into  fixed-length  codes 
which  ate  longer  than  f  loggia)  I  bits,  where  a  is  the  number  of  the 
source  alphabet  A.  Here,  for  the  ith  segment,  I^'^‘*'(bits),  the  length 
of  the  tth  codeword  is 

=  riog2(«)i- 

where  &  is  the  number  of  entries  in  the  last  string  table.  As  a  result, 
this  mapping  cause  following  two  problems:  in  Case  1,  the  compres¬ 
sion  ratio  converges  very  slowly  and  in  Case  2,  the  compressibility  is 
poor  when  the  entropy  of  the  information  source  is  very  high. 

On  the  other  hand,  binary  trees  are  excellent  in  terms  of  conver¬ 
gency  of  the  compression  ratio  because  the  codewords  assigned  to 
symbols  depend  on  the  probability  of  their  occurrences.  Jones  pro¬ 
poses  a  data  compression  method  using  a  splay  tree[2],  which  is  a 
self-adjusting  binary  tree.  It  can  adjust  itself  quickly  to  a  local  redun¬ 
dancy  of  the  information  source,  and  is  effective  to  the  high  entropy 
information  source. 

In  this  study,  we  propose  a  novel  source  coding  technique  based  on 
the  LZW  Algorithm  in  order  to  alleviate  disadvantages  on  the  LZW 
method.  This  technique  maps  the  source  alphabet  into  variable-length 
codes  by  using  the  splay  tree,  and  maps  parsed  strings  into  the  short¬ 
est  fixed-length  codes,  which  is  suitable  for  the  number  of  entries  in 
the  string  table.  However,  in  this  mapping,  the  decoder  can  not  recog¬ 
nize  which  code  is  sent,  variable  or  fixed.  In  order  to  distinguish  one 
from  another,  we  use  a  flag  bit  which  is  added  to  the  codeword.  Here, 
for  the  ith  segment,  the  length  of  the  ith  codeword  is 

rPropoit  _  /  variable-length  -Hi  (0  <  i  <  a) 

•  ~  I  r  log2{^  -  -  1)}  1  +  1  (“  <  '  <  Omw) 

where  Dmai  maximum  dictionary  size.  As  a  result,  the  proposed 
method  is  expected  to  be  superior  to  the  LZW  method  in  terms  of 
convergency  of  the  compression  ratio  and  the  compressibility  in  en¬ 
coding  the  high  entropy  information  source. 

Let  the  number  a  of  symbols  be  256.  In  the  proposed  method  and 
the  LZW  one,  the  dictionary  size,  which  correspond  to  the  maximum 
number  of  parsing  segment,  is  restricted  op  to  4096.  Both  methods 
use  LRU  (Least  Recently  Used)  deletion  heuristic.  After  the  symbol 
is  encoded,  the  splay  tree  is  updated  by  using  semi-splaying,  a  variant 
of  splaying. 

Figs.],  2  show  the  compression  ratio  at  the  beginning  of  encoding 
respectively  for  C  program  and  image  data  which  ate  digitized  using 
256  grey  levels.  For  the  low  entropy  information  source  such  as  C 
program,  the  proposed  method  gives  a  high  compression  ratio  which 
is  almost  equal  to  that  of  the  LZW  method  because  parsed  strings  can 
be  encoded  effectively.  For  the  high  entropy  information  source  such 
as  image  data,  the  LZW  method  are  difficult  to  compress  structurally 


Fig.l  Compression  ratio  vs  length  of  input  sequence  (C  program) 


because  parsed  strings  cannot  be  encoded  effectively.  The  proposed 
method,  however,  can  further  compress  because  it  maps  symbols  into 
variable-length  code.  Furthermore,  the  proposed  method  can  adapt 
itself  to  these  files  more  quickly  than  the  LZW  method. 

The  proposed  method  is  not  only  superior  to  the  LZW  method  in 
terms  of  convergency  but  also  compressef  the  high  entropy  informa¬ 
tion  source  effectively.  That  is,  the  proposed  method  has  the  higher 
universality. 
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Abstract 

We  are  looking  for  an  “essentizJ  statistic”  of  a 
finite-alphabet  ergodic  source,  that,  under  a  given 
storage  (memory)  constraint,  will  allow  discrimina¬ 
tion  between  the  given  source  and  any  other  finite- 
alphabet  source. 

In  our  model  an  encoder  is  given  the  n-th  order 
statistics  of  a  stationary  process,  and  the  encoder 
output  is  a  binary  N-vector.  A  discriminator,  ob¬ 
serves  the  n-th  order  statistics  of  a  second  source 
that  is  either  identical  to  the  first  source  or  differs 
from  it  by  a  specified  KuIIback-  Leibier  divergence. 
In  a  sense  made  precise  in  the  paper,  we  show  that 
when  n  is  large,  this  can  be  done  if  and  only  if 
N  >  exp(nH),  where  H  is  the  entropy  of  the  first 
source. 
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1  Introduction 

In  a  recent  paper  submitted  to  IEEE  Transactions  on  Informa¬ 
tion  Theory  [I],  we  introduced  BAG.  BAG  is  a  variable  to  fixed 
block  coder  in  that  the  input  is  parsed  into  variable  length  sub¬ 
strings  which  are  encoded  with  fixed  length  output  strings.  As¬ 
sume  the  input  is  taken  from  an  alphabet  with  m  symbols  and 
the  codebook  has  K  codewords.  With  each  input  symbol,  the 
encoder  splits  the  set  of  codewords  into  m  disjoint,  nonempty 
subsets.  The  recursion  continues  until  fewer  than  m  codewords 
remain.  One  of  these  is  transmitted,  and  the  encoder  reinitial¬ 
ized.  The  encoding  process  is  described  in  Figure  1. 


K  =  Ks\ 

A  =  l; 

while  ((/  =  getinputO)  ^  EOF){ 
Compute  Ki,K-2,  Km', 
A  =  A+T:rJiKj', 

K  =  Kr, 
ii{K<m){ 

Output  code(/4); 
A=l; 

K  =  Ks', 

} 

} 

doeof(A,  K)-, 


Figure  1;  Basic  BAG  Encoder. 

The  Kj  satisfy  the  following:  1)  Kj  =  I  -I-  Lj  *  {m  —  1)  for 
i  =  0, 1, 2, . . .  and  2)  Kj  =  K.  The  first  condition  assures 
two  things:  that  Kj  >  0  and  that  Kj  equals  the  number  in  a 
complete  and  proper  set.  Let  N(K)  denote  the  expected  num¬ 
ber  of  codewords  encoded  with  K  codewords.  For  i.i.d.  inputs, 
N(K)  satisfies 

N{K)  =  l  +  f:pjN(Kj)  (1) 

j=i 

The  principal  question  is  how  are  the  Kj  determined.  In  [I], 
we  offered  several  methods.  Firstly,  Kj  can  be  chosen  optim¬ 
ally  by  dynamic  programming.  Secondly,  Kj  can  be  chosen  by 
an  arithmetic  coding  heuristic:  Kj  =  [pjA'J,  where  \p,K]  is  a 
qu2mtization  such  that  conditions  1)  2Uid  2)  above  are  satisfied. 
Thirdly,  if  we  imagine  that  we  can  ignore  the  necessity  that 
Kj  be  integer,  then  take  Kj  =  PjK.  This  solution  results  in  a 
hypothetical  entropy  coaer. 

Denote  the  expected  number  number  of  input  symbols  en¬ 
coded  with  K  codewords  by  No{K),  Nk(K),  and  Nc{K),  respect¬ 
ively.  Then,  for  i.i.d.  inputs,  for  all  K  and  for  some  constant  C, 
we  showed  the  following: 

^  =  K{K)  >  N„(K)  >  Nh{K)  >  ^  -  C  (2) 


For  Markov  sources,  let  p(»j/)  =  Pr(xj  =  =  a/).  Then 

the  recursion  for  N{K)  splits  into  two  parts.  The  first  is  for  the 
first  input  symbol;  the  second  is  for  all  other  input  symbols: 


m 


N(K)  = 

\  +  Y.P,N(K,\1) 

1=1 

(3) 

N{K\l)  = 

i+Zp[i\mK.\i), 

(4) 

(5) 

where  JV(A'|1)  is  the  number  of  input  symbols  encoded  using 
K  codewords  given  that  the  current  input  is  a/.  (K(K)  does 
not  include  the  current  input,  aj.)  The  heuristic  is  as  follows: 
Choose  Ki  =  [p(j|/)A’].  The  optimal  Kj  can  again  be  chosen  by 
dynamic  programming.  We  can  state  the  following  theorem: 

Theorem:  If  the  Markov  chain  is  time-invariant,  ergodic, 
and  symmetrical  in  the  following  way. 


Hwi)=-j:p{j\i)iogp{j\i) 

;=i 


is  independent  of  /,  then,  for  all  /, 


(6) 

(7) 


and 


N{K)  > 


log  A* 


(8) 


Proof  (sketched):  The  symmetry  condition  allows  that  the  en¬ 
tropy  solution,  Kj  =  PjK,  satisfies  (4).  The  proof  that  Nh{K\1) 
satisfies  (7)  follows  almost  identically  from  [1].  (8)  follows  dir¬ 
ectly  from  (7). 

If  the  encoder  starts  anew  with  each  block,  then  some  loss  of 
efficiency  occurs  since  the  first  symbol  of  each  block  is  encoded 
with  its  stationary  probability,  nots  its  Mwkov  one  conditioned 
on  the  previous  symbol.  However,  encoding  blocks  separately 
yields  greater  resistance  to  channel  errors. 

To  get  a  feeling  for  the  magnitude  of  C,  we  computed  N{K) 
for  two  situations.  The  first  is  a  binary  symmetric  Markov  chain 
with  crossover  probability  equal  to  0.05.  For  65536  codewords 
(16  bits),  N,  =  53.4,  N„  =  51.0,  and  =  50.5.  The  second  is 
a  binary  asymmetric  Markov  chun  with  crossover  probabilities 
equal  to  0.05  and  0.50.  Again  for  16  bit  codewords,  Nt  =  45.3, 
Wo  =  45.2,  and  N/,  =  44.4. 
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Abstract — The  usual  representation  of  a  random  sequence  on 
a  finite  alphabet  is  obtained  by  recording  the  value  occurring 
in  each  position  as  the  positions  are  scanned  in  some  standard 
order.  Here,  we  propose  a  representation  obtained  by  record¬ 
ing  the  positions  occupied  by  each  value  as  the  values  are 
scanned  in  some  specified  order.  Entropy  is  preserved  in 
converting  to  the  positional  representation.  Also,  the  unknown 
positions  can  be  arbitrarily  rearranged  as  the  occupied  posi¬ 
tions  are  revealed.  Under  control  of  a  memory  model,  we 
propose  a  rearrangement  acting  to  reduce  the  first-order 
entropy.  This  allows  better  compression  using  an  adaptive 
method,  such  as  the  Lempei-Ziv  algorithm.  Memory  effects 
over  large  sample  distances,  multiple  dimensions,  or  large 
alphabets  can  be  directly  applied  in  predicting  positions  rather 
than  slowly  learned  by  the  adaptive  coder.  Some  empirical 
results  for  grey-scale  television-quality  images  (480  rows  by  S12 
columns  by  256  Intensities)  are  included. 


A  Summary  of  the  Representation 

Suppose  the  source  to  be  compressed  is  characterized  as  the 
discrete-time,  discrete- valued  random  process  {X„;  n  €  A^} 
where  each  €  V.  For  simplicity,  we  wUl  assume  here  that 
the  sequences  A/"  =  and  V  =  {1,‘'-,V}  are  finite. 

This  model  is  sufficient  for  one  video  or  audio  frame  which  has 
already  been  quantized  over  a  bounded  interval. 

Typically,  the  source  would  be  processed  by  considering  the 
values  X„  under  the  prearranged  ordering  of  the  n  €  A/*.  For  a 
speech  frame,  we  usually  process  the  samples  in  time  order.  For 
an  image  frame,  we  usually  process  the  pixels  in  the  order  they 
appear  in  a  raster  scan  (left-to  right  pixels  within  top-to-bottom 
rows,  say). 

Consider  the  indicator  process  {lx(n,v);  n  €  N^,v  €  V} 
derived  from  the  source  process  by 

1  f  \  _  /  if  =  u 
lx(n,i>)-  I  otherwise. 

Note  that  {lx(n,  v)}  is  constrained  to  be  unit- valued  for  exactly 
one  V  6  V  at  each  n  €  A/'.  Clearly,  the  mappings  between  the 
source  representations  {An}  and  {lx(n,v)}  are  unique,  so  the 
entropy 

H(lx)  =  H(X) 

is  preserved  in  converting  between  them. 


This  work  has  been  supported  by  the  Natural  Sciences  and 
Engineering  Research  Council  of  Canada  under  Research  Grant 
A66S8  and  by  the  Information  Technology  Research  Centre,  an 
(Ontario  Centre  of  Excellence. 


Let  Ad  =  {m,-  =  (ti,-,  tt,);  i  =  1,  •  •  ■ ,  NV}  be  a  sequence  com¬ 
prising  an  arbitrary  permutation  of  the  elements  in  the  product 
space  A/"  X  V.  Define  the  indicator  process  {#(m);  m  €  Ad}  by 

§(m<)  =  lx(nj,i;i)  for  i  =  1,  •  •  • ,  ATV. 

This  is  just  an  arbitrary  one-dimensional  ordering  imposed  on 
the  elements  of  Af  x  V  so  the  entropy 

R($)  =  H(lx)  =  H{X) 

is  still  preserved. 

The  process  {#}  has  a  strong  memory  effect  due  to  the 
constraint  on  the  process  {lx}.  Suppose  —  (ni,Vi)  has 
♦(m,)  =  1.  For  some  j  >  i,  if  m,-  =  (nj-,u>)  has  n,  =  ni, 
we  observe  that  4(mj)  =  0  must  obtain.  That  is,  once  a  value 
is  determined  for  a  particular  position  n  €  AC,  no  other  value 
can  be  specified  for  that  same  position.  Define  the  modified 
indicator  process  {♦(m);  m  6  Ad}  by 


♦(m,)  = 


1,  if  ♦(m,)  =  1 
0,  if  =  0  and  ♦(mj)  =  0 
for  all  t  <  j  having  =  rij 
A,  if  ♦(mj)  =  0  and  ♦(m^)  =  1 
for  some  »  <  j  having  =  nj 


Here,  A  is  the  null  symbol.  Since  knowledge  of  Ad  allows  the  A’s 
to  be  inserted,  they  can  be  left  out  of  the  representation  of  {¥}. 
Again,  the  entropy 


ff(*)  =  R(*)  =  ff(X) 

is  preserved,  for  any  choice  of  the  ordering  Ad  on  AC  x  V. 

We  conjecture  an  advantage  in  doing  lossless  compression 
using  the  process  {9}  for  three  reasons.  First,  {9}  is  binary, 
regardless  of  the  size  of  the  source  alphabet  V.  Second,  {4'}  is 
one-dimensional,  regardless  of  the  natural  or  usual  dimension¬ 
ality  of  the  space  M  indexing  the  original  source  process  {X}. 
Third,  the  freedom  in  choosing  Ad  means  that  the  compress¬ 
ibility  can  be  maximized  by  u  '  g  memory  modelling,  combined 
with  backward  adaptation  or  side  information,  to  determine  the 
best  permutation  of  .V  x  V. 

Lossless  compression  methods,  such  as  the  Lempel-Ziv  algo¬ 
rithm,  work  best  on  small  alphabet  sources  having  a  low  first- 
order  entropy.  We  split  Ad  into  two  contiguous  segments.  Ado 
having  the  probability  of  obtaining  9(m)  =  0  maximized,  imd 
Adi  having  the  probability  of  obtaining  4(m)  =  1  maximised. 
These  are  compressed  separately.  The  memory  model  is  applied 
to  yield  a  good  choice  for  Ad,  and  the  split  into  Ado  and  Ad],  in 
the  sense  of  minimizing  the  first-order  entropies  of  the  segments. 
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Summary 

The  use  of  likelihood  methods  to  treat  image  data 
has  grown  significantly  over  the  past  twenty  years. 
Three  forces  continue  to  drive  this  evolution.  The 
first  is  the  rapid  and  continued  development  of  in¬ 
strumentation  used  to  acquire  image  data,  with 
tomographic  instrumentation  in  nuclear  medicine 
and  radiology  being  an  important  early  example. 
A  second  important  development  was  the  identifi¬ 
cation  by  L.  Shepp  and  Y.  Vardi  in  1982  of  numeri¬ 
cal  procedures  making  likelihood  methods  feasible. 
Lastly,  the  increasing  power  of  digital  computation 
has  permitted  more  and  more  complicated  likeli¬ 
hood  methods  to  be  used. 

As  in  other  application  areas,  the  power  of  likeli¬ 
hood  methods  for  imaging  relies  on  having  an  accu¬ 
rate  statistical  model  describing  the  available  data 
and  how  these  data  are  influenced  by  the  under¬ 
lying  objects  to  be  imaged.  Poisson  and  Gaussian 
processes  in  time  and  space  often  appear  in  models 
that  account  for  most  of  the  major  sources  of  noise 
and  distortion  in  a  wide  variety  of  imaging  modal¬ 
ities.  Side  information  placing  constraints  on  the 
object  to  be  imaged  can  also  be  of  major  impor¬ 
tance.  U.  Grenander’s  theory  of  object  shapes,  the 
introduction  into  imaging  by  U.  Grenander  and  M. 
Miller  of  jump-diffusion  processes,  and  the  use  of 
Markov  random  fields  to  accommodate  shape  con¬ 
straints  are  all  important  developments  strength¬ 
ening  the  use  of  likelihood  methods  for  imaging. 
Regularization  's  also  significant  because  imaging 
problems  are  <j.i>  j.  ill  posed  leading  to  unstable 
solutions  wiiii  u  .unstrained  maximum  likelihood. 
Penalty  coi.  '^s,  including  object-model  con¬ 
straints,  have  been  found  u.seful  as  a  form  of  regu¬ 
larization  as  has  U.  Grenander’s  method  of  sieves. 

My  objective  in  this  talk  is  to  review  likelihood 
methods  that  are  being  used  for  imaging.  The 


original  motivations  from  single-slice  tomographic 
imaging  in  nuclear  medicine  will  be  mentioned,  but 
emphasis  will  be  placed  on  more  recent  develop¬ 
ments,  applications,  and  trends. 
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Abstract. 

We  present  a  detailed  result  on  Frana.szek’s  principal  state  method  for  the 
generation  of  runlength  constrained  codes.  We  show  that,  whenever  the 
constraints  k  and  d  satisfy  t  >  2d  >  1,  the  set  of  “principal  states”  is 
*0,  *1 , '  ■  ■ .  *»-i  ■  Thus  there  is  no  need  for  Franaszek's  search  algorithm 
anymore.  The  counting  technique  used  to  obtain  this  result  also  shows 
that  “state  independent  decoding”  can  be  achieved  using  not  more  than 
three  codewords  per  message  and  it  allows  us  to  compare  the  principal  state 
method  with  other  practical  schemes  originating  from  the  work  of  Tang  and 
Bahl  and  also  allows  us  to  use  an  efficient  eniimerativc  coding  implementa¬ 
tion  of  the  encoder  and  decoder. 

Introduction. 

Shannon  [4]  considers  the  (d,  i!:)-constrained  channel,  where  the  only  pos¬ 
sible  binary  sequences  that  can  he  transmitted  over  the  channel  are  those 
containing  runs  of  zeros  of  length  d, . . .  ,k.  (d  <  k).  These  channels  can  be 
described  by  a  state  model  where  each  state  is  indexed  by  the  length  of  the 
current  run  of  zeros.  Shannon  defines  the  capacity  of  this  channel  as  the 
limit  as  n  — •  oo  of  the  logarithm  of  the  size  of  the  set  of  all  sequences  of 
length  ri  satisfying  the  (d,  i:)-ron.straint  divided  by  n. 

A  runlength  constrained  code,  (d,  t)-constrained  code,  is  a  binary  encoding 
of  information  such  that  in  the  code  sequence  successive  ones  are  separated 
by  at  least  d  zeros  and  at  most  k  zeros  and  thus  is  well  suited  for  ii.se  on  a 
(d,t)-const rained  channel. 

We  shall  consider  fixed  length  codes  for  these  purposes.  Valid  codewords 
follow  a  possible  path  in  this  state  model,  starting  at  the  state  where  the 
previous  codeword  ended.  So,  a  code  for  this  state  model  contains  sev¬ 
eral  codewords  sets,  each  containing  a  variable  number  of  words,  where  the 
selected  set  depends  on  the  previous  codeword  and  is  such  that  the  concate¬ 
nation  of  that  codeword  with  any  word  in  the  set  is  permissible.  Since  we 
consider  fixed  length  codes  the  size  of  the  code  is  determined  by  the  smallest 
set  belonging  to  some  state  in  the  model. 

Franaszek  [3]  noted  that  if  we  take  a  subset  of  all  states  in  the  model  and 
require  the  codewords  to  start  and  end  in  states  of  this  subset  then  an 
optimum  subset  exists.  This  subset  is  known  as  the  set  of  principal  stales 
and  Franaszek  described  an  algorithm  to  search  for  these  principal  stales. 
Another  approach,  presented  by  several  authors,  [1,  5,  6],  is  to  use  a  single 
set  S  of  codewords  that  satisfy  the  (d, k)-conslraint  internally.  A  special 
sequence  is  put  in  between  two  codewords  such  that  the  (d,  t)-conslrainl 
remains  satisfied  between  codewords. 

The  principal  state  method  is  an  optimal  code  for  systems  that  can  be  de¬ 
scribed  in  the  state  model  framework,  and  thus  it  is  at  least  as  efficient  as 
any  of  the  glue  methods,  since  the  glue  methods  can  also  be  described  in  the 
state  model  framework. 

The  principal  slates. 

Our  goal 

We  start  with  the  definition  of  the  building  blocks  or  basic  sets  U(m)  for 
the  codeword  sets  given  the  (d,  ib)-constraint,  containing  all  sequences  that 
start  and  end  with  a  “one”  and  salisfie  the  (d, ir)-con.slrainl  internally.  Let 
M(m)  denote  the  size  of  (/(m). 

In  the  following  we  shall  repeatedly  use  the  shorthand  notation  (a;t)  = 
{a,a  -b  1 . 6). 

Let  5  C  [0:  *1  denote  the  set  of  permitted  channel  stales,  (not  necessarily  the 
set  of  principal  states).  Consider  the  sets  V5(n;i)  containing  all  sequences 
starting  with  a  run  of  i  zeros  and  ending  in  a  run  of  r  6  5  zeros  and  satisfying 
the  (d,  k)-con.straint  internally.  Note  that  ^5(0:1)  can  be  described  using 
the  basic  seta  as 

^^(n.i)  =  IJ  (0*}  *f/(n  -  i  -  j)*  {(V). 

where  U  *V  indicates  the  set  containing  all  concatenations  of  the  sequences 
get/  with  any  sequence  J|  6  V. 

With  these  sets  we  can  make  Franaszek's  state  depending  codeword  sets 
IVijIn;!),  i  e.  the  set  of  possible  codewords  of  length  n  starting  in  slate 
i  €  5  and  ending  in  any  stale  j  e  S.  We  have 


^^5(''i0=  U  E5(n;j). 

Now  we  can  formulate  our  goal  and  the  result: 

Given  n,  k,  and  d,  (with  the  restriction  n  >  k  >  2d  >  Q),  find 
the  set  5*  C  (0;i:]  such  that: 

W5-(n)  i  rnax  minHV5(n;i)|  =  -d-i-J). 

5c[0;*]ieS  So  1=0 

Message  mapping  for  state  independent  decoding 

Partition  the  messages  into  seta  Afi  of  sizes  Mi  =  |V'5»(n;d4  f)(, 

where  i  =  0,1,  -  ■  ,k  ^  d.  Let  r  he  the  number  of  trailing  zeros  in  the 
previous  codeword.  We  distinguish  between  the  following  cases: 

d  =  1  and  r  =  0:  The  messages  in  the  set  Afi  are  assigned  to  the  set 
V5*(n;i  +  1). 

d  =  1  and  r  >  I:  The  set  Mo  is  assigned  to  ^5*(n;  I)  and  U . .  -UMt^i 
are  assigned  to  V^*(n;0). 

d  >  1  and  r  <  d:  For  all  i  =  0 . 1;  —  d  —  r  we  assign  to  the  set  Mi  the 

codewords  from  V'^»(n;  d+i)  respectively.  For  t  =  ib— d— r+l, . . . ,  d 
we  assign  to  the  set  Mi  the  codewords  from  +  2d  —  ib  +  1) 

respectively. 

d  >  1  and  r  >  d:  The  sets  Mo  U  Afj  U . .  -  U  Mj-j  are  assigned  to  Vg* (n; 0) 
and  V'5*(n;  1)  in  that  order. 

So,  it  is  easy  to  see  that  every  message  is  encoded  into  one  of  two  or  three 
different  codewords,  depending  on  r. 

Enumerative  coding. 

We  shall  briefly  indicate  the  application  of  the  well-known  enumerative  cod¬ 
ing  technique  [2]  to  the  generation  of  the  (d,b)-con8trained  sequences. 

First  we  determine  the  message  subset  Mi  of  the  message  m  that  we  want  to 
transmit.  Then,  with  the  rules  of  the  previous  section  we  determine  the  set 
V5*(n;j)  and  the  relative  index  i(£";  V'^*(n;i))  of  our  message  in  the  set. 
Finally  we  use  the  enumerative  reconstruction  to  produce  the  codeword  r"  € 
Vs*{n\j)  from  its  index. 

Let  the  codeword  be  given  as  x"  =  1 . . .  10®».  So  oo  =  j- 

Although  we  will  not  need  the  (source)  encoding  algorithm,  it  is  instructive 
to  see  how  the  index  can  be  computed  recursively  as 

0(1* )-l 

tsd 

where  o(£")  =  oj  as  given  above. 

Note  that  this  computation  produces  a  lexicographical  ordering  given  the 
symbol  ordering  ”1  <  O’*.  Also  note  that  in  order  to  compute  the  index  we 
only  need  the  n  +  1  numbers  |V'5'*(p;  0)j  for  0  <  p  <  n. 

Reconstructing  £"  involves  producing  the  oq*  •  •  •  Md  they  can  be  found 
recursively  by  the  corresponding  enumerative  decoding  algorithm. 
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Recent  work  by  the  authors  [1]  described  a  new  approach 
for  constructing  fixed-length  (d,  k)  codes;  these  codes  are  “block- 
decodable”  -  i.e.,  they  can  be  decoded  with  no  memory  and  no 
anticipation  [2]  -  and  they  do  not  require  look-ahead  at  the  en¬ 
coder.  Furthermore,  it  is  shown  in  [1]  that  the  approach  described 
therein  is  optimal  over  all  block-decodable  codes  with  no  look¬ 
ahead.  The  new  codes  do  not  rely  on  a  search,  and  they  have  a 
very  simple  structure.  In  this  talk  we  shall  discuss  how  the  new 
approach  can  be  combined  with  an  error-control  structure  to  yield 
combined  modulation  and  error  control  coding. 

The  approach  in  [1]  is  based  on  aet-concatenatability.  Let  C" 
denote  the  set  of  binary  n-tuples  satisfying  the  (d,  k)  constraint. 
Then  a  set  5  =  {So,  Si, . . .  Su-i}  of  disjoint  subsets  of  C"  is 
called  a  set-concatenatable  collection  (SCC)  if  for  any  Si,  Sj  €.  S 
2uid  any  a:  €  5;  there  exists  a.  y  €  Sj  such  that  x*y  €.  C^".  The 
block  codes  in  [1]  are  based  on  the  maximal  set-concatenatable 
collection  (MSCC),  and  they  can  encode  up  to  M  messages  where 
M  is  the  size  of  the  MSCC.  (In  [1]  it  is  shown  that  M  is  equal  to 
the  number  of  (d,  k)  sequences  of  length  n  with  at  least  d  leading 
zeroes  and  at  most  ifc  —  1  trailing  zeroes.) 

In  this  talk  we  demonstrate  how  the  simple  structure  of  the 
codes  in  [1]  is  easily  incorporated  into  a  joint  runlength/error- 
control  scheme.  In  this  summary  we  shall  show  how  the  approach 
of  Lee  and  Wolf  [3]  may  be  easily  adapted  to  the  new  technique. 

Let  S  =  {So, . . .  jSm-i}  be  a  MSCC  with  blocklength  ni  for 
the  (d,  k)-constrained  channel;  assume  that  S  is  constructed  ac¬ 
cording  to  the  procedure  described  in  [1].  Let  Sp  =  {0*,  1*}  be 
another  set-concatenatable  collection  -  one  of  size  two  with  the 
smallest  possible  blocklength  nj',  assume  once  again  that  Sp  is 
constructed  using  the  approach  of  [1].  It  can  be  shown  that 


_  (  d+l,  if  k  >  d-1- 1; 
”*"U-i-2,  iffc  =  d-|-l. 


Definition:  Given  x  =  (11,13, ■••,Xn,)  €  C"'  and  c  €  Sp, 
define  a  generalized  parity  check  />(•,  ■)  as  follows: 


h(x,c) 


0,  if  ETii  Xi  is  even  and  c  €  0* 
Oj  if  jrr=i  c  €  I* 

1)  127=1  is  *“>d  c  €  1* 

li  if  52?ii  is  odd  and  c  €  0* 


Given  S  and  Sp,  we  “glue”  them  together  in  two  different 
ways  to  obtain  two  new  collections  S‘  =  (SJ,  Sf, . . . ,  su-i)  and 

50  _/  CO  CO  Co  \ . 

S?  =  {x*c:x€  Si,ce  (0*ur),x*c€C”'+’*’,h(x,c)  =  0} 
and 

Sf  =  {x*c:x€Si,c€(0*Ul‘),x*c6C"‘+"’,A(x,c)  =  I}. 

Claim:  5'  and  S"  are  disjoint  collections  such  that  |5‘|  = 
15®|  =  Af.  Furthermore,  is  a  set-concatenatable  collection; 

finally,  any  two  codewords  from  different  elements  of  S’  (resp..  S’) 
lie  at  distance  at  least  two  from  one  another. 


If  M  >  2™,  we  can  construct  a  code  with  d/ree  =  3  that  can 
encode  2”*  messages  by  numbering  the  elements  of  S’  (resp..  S’) 
with  even  (resp.,  odd)  integers;  the  rate  of  the  resulting  code 
will  be  m/(ni  +  nj).  This  is  done  by  constructing  a  completely 
connected  trellis  with  2’"  states  numbered  by  (0, 1,  •  •  • ,  2™  —  1} 
such  that  the  edge  from  state  x  to  state  y  is  associated  with  the 
“codeword”  -  the  set  of  (ni  -(-  n3)-tuples,  actu2jly  -  numbered 
with  the  integer  L(x,y)  =  x  —  2y  (mod  2”*‘*'*).  This  encoding 
rule  guarantees  a  non-catastrophic  encoder  such  that  the  outgoing 
edges  from  any  state  have  the  same  “parity”  -  i.e.,  are  labeled 
with  either  the  elements  of  S’  or  S’  but  not  both.  This  in  turn 
guarantees  a  free  distance  of  at  least  three. 

Example:  We  shall  construct  a  code  with  (d,  it)  =  (2,4),n  = 
9,  R  =  2/9.  Start  with  a  (2,4)  code  with  blocklength  ni  =  6: 

50  =  {000010,010001}  Si  =  {000100,010010} 

51  =  {001000, 100001}  S3  =  {001001, 100010}. 

We  now  use  as  the  “parity  check”  Sp  =  {0",  1*}  where  0*  = 
{000,010}  and  1*  =  {001,100}.  This  yields  the  collection 

5^  =  {000010001,010001000}  5;  =  {000010010,010001001} 

=  {000100100,010010010}  =  {000100010,010010001} 

s;  =  {001000100,100001000}  5;  =  {001000010,100001001} 

SJ  =  {001001000, 100010010}  Si  =  {001001001, 100010001} 

Using  the  completely  connected  trellis  with  four  states  labeled 
eu;cording  to  the  rule  above,  we  can  encode  data  at  a  rate  Ji'  =  2/9 
with  free  distance  three.  By  comparison,  if  we  use  the  approach 
in  [3]  -  which  employs  codewords  that  can  be  freely  concatenated 
regardless  of  encoder  state  -  the  best  rate  that  could  be  achieved 
with  blocklength  n  =  9  would  be  A  =  1/9.  To  obtain  a  rate 
close  to  2/9  using  the  method  of  [3]  would  require  a  blocklength 
of  n  =  14,  with  the  resulting  increase  in  complexity.  ■ 

Moreover,  when  fc  >  2d-f  1,  we  can  construct  quasi-systematic 
codes  whose  codewords  are  made  up  of  d  merging  bits,  Ui  —  d 
information  bits,  and  d  -1-  1  checking  bits  -  provided  we  revise 
A(-,')  suitably. 

A  class  of  single  bit-shift  error  detecting  and/or  correcting 
codes  can  also  be  constructed  by  defining  an  appropriate  parity 
function.  Furthermore,  these  codes  are  able  to  deal  with  bit-shift 
errors  crossing  the  border  of  two  adjacent  codewords. 
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Systematic  Runlength-Limited  Codes 
For  Single  Error  Detection 
In  the  Magnetic  Recording  Channel. 
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The  runlength-litnited  codes  are  used  in  magnetic 
recording.  A  ninlength-limited  code  is  characterized  by  its 
(d,k)  constraints.  The  d  constraint  being  the  minimum  run 
of  consecutive  zeros  and  the  k  constraint  being  the 
maximum  run  of  consecutive  zeros. 

Methods  for  mapping  unconstrained  binary 
sequences  to  (d,k)  constrained  sequences  exist.  Such 
mappings  are  called  modulation  codes.  In  this  article,  we 
assume  the  existence  of  a  modulation  code  and  are 
concerned  with  detecting  errors  which  occur  in  magnetic 
recording. 

Errors  which  occur  in  magnetic  recording  can  be 
categorized  as  drop  in  errors,  drop  out  errors,  or  shift 
errors.  A  drop  in  error  occurs  when  a  zero  is  changed  to  a 
one.  A  drop  out  error  occurs  when  a  one  is  changed  to  a 
zero.  A  shift  error  occurs  when  the  pattern  01  is  changed  to 
10  or  the  pattern  10  is  changed  to  01 . 

To  detect  single  errors  for  an  unconstrained  binary 
symmetric  channel,  a  single  parity  bit  is  adjoined  to  each 
information  sequence.  With  a  runlength-limited  code  for 
the  magnetic  recording  channel,  the  detection  of  single 
errors  is  not  so  trivial.  The  parity  must  be  chosen  to 
maintain  the  runlength  constraints  and  to  detect  all  three 
types  of  channel  errors. 

The  purpose  of  this  article  is  to  present  a 
construction  of  systematic  runlength-limited  block  codes 
for  detecting  single  errors  in  the  magnetic  recording 
channel,  whether  a  drop  in,  drop  out,  or  shift  error.  The 
codes  can  be  designed  for  any  (d,k)  constraints.  The 
encoder  table  has  3(k+l)  entries.  Error  detection  is 
performed  by  a  simple  arithmetic  calculation.  Optimal 
systematic  single  error-detecting  codes  are  obtained,  for 
the  (l,k)  and  {2,k)  constraints  with  k  >  2(d+l),  by 
truncating  the  constructed  codes. 
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Summary 

Recently  »  new  class  of  maximum  runlength  limited  error  con¬ 
trol  codes  (RLLECCs)  have  been  id-mtified  [1-3].  They  are 
formed  by  taking  an  appropriate  coset  of  a  linear  transpar¬ 
ent  error  control  code  and  thereby  inherit  the  error  control 
characteristics  and  implementation  advantages  of  the  parent 
code.  A  new  class  of  parent  codes  which  realised  opti¬ 
mum  RLLECCs  with  minimum  distance  4  was  recently  iden¬ 
tified  in  (21.  These  codes  were  defined  in  terms  of  their  parity 
check  matrices  and  although  linear  they  are  not  cyclic.  Con¬ 
sequently  practical  realisation  of  these  schemes  necessitates  a 
network  of  EXOR  gates  for  encoding,  whilst  more  obviously 
for  decoding  storage  of  lists  of  syndromes  and  corresponding 
error  patterns.  However  the  parity  check  matrices  of  these 
optimum  codes  possess  some  cyclic-like  properties  and  in  this 
paper  we  exploit  these  features  to  develop  simplified  encod¬ 
ing  and  decoding  algorithms  which  readily  lend  themselves  to 
implementation  using  VLSI  microcircuits  and  require  no  stor¬ 
age  of  syndromes  and  error  patterns.  These  circuits  can  be 
realised  simply  with  EXOR-gates,  AND-gates,  a  4-input  ma¬ 
jority  gate  and  shift  registers.  Furthermore,  we  find  that  the 
circuits  are  general  for  a  particular  runlength  constraint  the 
only  difference  for  higher  rate  codes  being  the  number  of  shifts 
required  to  perform  the  encoding  and  decoding  operations. 

Encoding  and  decoding  algorithms  for  three  particular  par¬ 
ent  BCCs  which  when  modified  appropriately  yield  runlength 
constraints  2,  6  and  14  will  be  considered,  although  similar 
algorithms  could  also  be  developed  for  other  cases.  Circuits 
which  perform  the  coding  operations  will  also  be  presented 
and  an  error  performance  comparison  of  the  new  algorithms 
with  conventional  syndrome  decoding  will  be  carried  out.  By 
way  of  example  figure  1  shows  a  comparison  between  the  per¬ 
formance  of  the  new  algorithms  and  syndrome  decoding  for 
a  (64,46)  code  with  a  maximum  runlength  constraint  of  6. 
Whilst  clearly  syndrome  decoding  performs  the  best  there  is 
no  significant  degradation  in  performance  for  the  less  com¬ 
plex  new  algorithms  A  and  B.  These  new  algorithms  are  also 
readily  adaptable  to  perform  soft  decision  decoding  and  the 
potential  coding  gains  using  soft  decision  versions  of  the  de¬ 
coders  will  be  considered. 
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Figure  1:  Error  performsmee  of  (64,46)  code  srith  different 
decoders 
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Alistrnct 

A  lrrhni(|ne  for  joint  modulation  atui  error  rorrorlimi 
is  dosrrilied.  In  order  to  ronstriirt  a  (d.f")  moilnlation 
code  with  x  information  bytes  that  corrects  up  to  t  errors, 
a  single  error-detecting  (inner)  block  modulation  code  is 
combined  with  an  (outer)  [«■  +  /,/r]  Heed-Soloinon  code. 

The  performance  of  this  scheme  is  compared  to  the  related 
•scheme  using  an  (inner)  block  modulation  rode  and  an 
(outer)  [a  +  2f,K]  Reed-Solomon  code,  a.s  well  as  to  (lie 
traditional  method  in  magnetic  recording,  which  involves 
the  concatenation  of  an  error-correcting  rode  u  itli  a  . sliding 
window  modulation  rode. 

1  Introduction 

A  well-known  method  to  construct  a  (d,  h)  modulation  code  with 
error-correcting  capabilities  is  to  concatenate  an  inner  block  mod¬ 
ulation  code  with  an  outer  Rced-Solomon  code  [2].  To  be  more 
precise,  let  I  be  a  binary  block  code  of  size  2''  for  which  the  cas¬ 
cading  of  codewords  gives  seqtiences  with  runlengths  of  at  least 
d  and  at  most  k  O’s  between  any  two  consecittive  I’s.  For  com¬ 
bined  error  protection  against  up  to  I  errors  and  modulation  of  n. 
information  bits,  we  concatenate  the  inner  code  I  with  an  outer 
[k  -I-  2f,  k]  Reed-Solomon  code  O  over  GF(2’ ). 

In  this  paper,  we  consider  a  modification  of  the  above-described 
scheme  by  choosing  the  inner  code  to  be  single  error-dett'cling. 
Since  a  single  error  in  an  inner  codeword  will  thus  always  result  in 
a  symbol  era,stire  for  the  outer  code,  O  only  needs  to  be  a  f/.  +1.  /.] 
Reed-Solomon  code  in  order  to  correct  up  to  f  bit  errors. 

The  idea  of  using  an  error-detecting  inner  code  is  not  completelv 
new.  In  fact,  this  scheme  can  be  considered  as  a  special  case  of 
Ytrehus’  general  scheme  for  constructing  runlength-Iimited  codes 
for  the  mixed-error  channel  [6]  (by  choosing  .s  =  0  in  this  scheme, 
while  Ytrehus  himself  accents  the  ase  .s  =  2). 

The  single  error-detecting  capability  of  the  inner  rode  can  be  es¬ 
tablished  by  choosing  {d,  k)  constrained  sequences  of  either  orld 
or  even  weight  [4], [5].  However,  by  using  more  advanced  methods 
like  the  one  presented  in  [ij,  we  occasionally  obtain  higher  rates, 
especially  when  k  is  close  to  2d. 

2  Comparisons 

The  traditional  way  to  establish  modulation  and  error  protection 
in  magnetic  recording  involves  the  concatenation  of  an  interleaveil 
error-correcting  code  with  a  sliding  window  modulation  rode  [2], 
VVe  have  compared  the  performance  of  this  traditional  scheme  with 
(he  two  block  schemes  from  the  i)revious  section.  Btith  analytical 


and  simulation  methods  have  been  used.  Special  attention  has 
been  paid  to  the  important  case  (d,  k)  =  (1,  7). 

We  have  calculated  and  compared  for  a  fixed  ntimber  of  informa¬ 
tion  bits  the  total  redundancy  for  each  method.  Among  other  re¬ 
sults,  it  turned  out  that  the  block  scheme  with  the  error-detecting 
inner  code  minimizes  the  redundancy  when  k  is  relatively  small. 
By  establishing  bounds  on  the  rate  difference  for  the  inner  block 
codes  with  and  without  error  detection,  the  two  block  schemes 
could  be  compared. 

Note  that  a  bit  error  often  causes  a  violation  of  the  {d.k)  con¬ 
straints.  For  a  block  scheme,  this  leads  to  a  s.vmbol  erasure,  since 
the  received  (inner)  word  does  not  any  longer  correspond  to  an 
(outer)  .symbol.  Hence  there  is  a  kind  of  interaction  between  the 
demodulator  and  the  decoder  in  the  block  schemes,  which  seems 
to  be  miasing  in  the  traditional  .scheme.  Since  it  is  hard  to  mea 
sure  the  effect  of  this  interaction  analytically,  we  have  run  various 
simulations.  Noise  was  injected  based  on  observations  made  in  [3]. 
For  the  {d,  k)  =  (1,  7)  case,  the  results  seem  to  indicate  that  it  is 
hard  to  improve  upon  the  traditional  scheme.  However,  it  was  also 
noticed  that  for  the  block  scheme,  the  ib-constraint  can  be  lowered 
from  7  to  5  without  losing  rate. 
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A  typical  encoding  configuration  for  a  magnetic  or  optical  record¬ 
ing  channel  consists  of  encoding  the  information  bits  with  an  error- 
correcting  code,  generally,  a  Reed  Solomon  code,  followed  by  a 
(d,  k)  constrained  code  [l]. 

In. general,  Recd-Solomon  codes  can  handle  the  most  common  typo 
of  errors:  random  errors  and  peak  shifts.  A  random  error  ran  l)e 
of  two  types:  a  0  becomes  a  1,  denoterl  0— *1,  or  a  1  becomes  a 
0,  denoted  1— >0.  Peak  shifts  arc  also  of  two  types:  0  1—1  0  or 
1  0->0  1. 

However,  there  arc  other  types  of  errors  that  cause  a  ratastropliir 
failure  due  to  loss  of  synchronization.  They  are,  deletion  of  a 
symbol  (0  or  1)  and  insertion  of  a  symbol  (0  or  1).  Although 
deletions  and  insertions  are  not  as  common  as  the  other  typos  of 
error,  if  we  are  able  to  determine  how  many  insertions  or  deletions 
occurred  in  an  interval,  by  inserting  or  deleting  a  proper  amount 
of  symbols  we  are  going  to  have  a  burst  error  that  will  either  be 
corrected  by  the  outer  error-correcting  code  or,  if  uncfirrertable, 
at  least  it  will  have  a  limited  length. 

Consider  (1,7)  sequences  (the  method  can  be  generalized  to  any 
(d,  fc)  sequence).  We  make  the  following  1-1  mapping  between  a 
(1,7)  sequence  and  symbols  in  /?7  (i.e.,  set  of  integers  modulo  7): 
to  each  run  of  O’s,  we  associate  the  number  of  zeros  minus  one. 

If  we  denote  by  L  the  length  of  the  binary  string,  by  f  the  length 
of  the  7-ary  string  and  by  5  the  sum  of  the  symbols  in  the  7-ary 
string,  these  three  parameters  are  related  by  L  =  5  -t-  27. 

At  the  7-ary  level,  we  encode  the  information  u.sing  an  [«.  >i  —  2] 
block  code,  where  n  >  7.  The  first  and  the  last  symbols  in  a 
block  are  redundant,  while  the  middle  ii  —  2  symbols  carry  the 
information.  We  require  that  in  each  block,  the  sum  >>f  the  symbols 


modulo  7  is  0.  The  last  symbol  in  a  block  and  the  first  symbol  in 
the  next  block  are  chosen  in  such  a  way  that  their  sum  is  equal  to  G. 
Thus,  we  are  inserting  exactly  10  binary  symbols  between  blocks 
in  the  binary  sequence.  It  is  important  to  have  a  fixed  amount  of 
redundancy  while  attempting  to  recover  synchronization.  Finally, 
ve  set  the  initial  condition  Oo  =  0. 

At  the  receiving  end,  if  a  7-ary  sequence  ho.hiJt;,, . . .  has  been 
received,  and  errors  have  occurred,  including  po.s.sible  insertions 
and/or  deletions  of  symbols,  we  show  how  to  recover  synchroniza¬ 
tion  with  high  probability  under  the  following  conditions  (that  arc 
determined  by  the  error  statistics  of  the  channel): 

1.  At  most  3  errors  in  at  most  A  consecutive  7-ary  blocks  of 

length  n  have  occurred,  say  in  blocks  m,  ?ii  -+  1 . m  ■+  r. 

where  r  <  A  —  1. 

2.  After  the  last  block  in  error,  say  block  m  -t  r,  there  are  at 
least  ,s  error-free  blocks. 

3.  The  length  n  of  each  block  is  at  least  7  (in  general,  A  -  t/+  1). 

Under  these  conditions,  we  present  a  method  that  will  allow  >is 
to  determine  how  many  symbols  have  been  deleted,  allowing  for 
recovery  of  synchronization. 

References 

[l]  P.  H.  Siegel,  “Recording  Codes  for  Digital  Magnetic  Storage," 
IEEE  TVans.  on  Magnetics,  Sept.  1985,  pp.  1344-1349. 


126 


Construction  of  Insertion/Deletion 
Correcting  RLL  Codes 

Patrick  A.H.  Bours 


Department  of  Mathematics  and  Computing  Science, 
Eindhoven  University  of  Technology’ 

PO-Box  513,  5600  MB  Eindhoven, 

The  Netherlands. 

Abstract 


An  algorithm  is  presented  for  the  construction  of  fixed  length  inser¬ 
tion/deletion  correcting  RLL  codes.  This  algorithm  uses  one  or  more 
fixed  length  g  ary  codjs  wiMi  given  Lee  distance  to  generate  fixed 
length  binary  (d,  k)-constrained  codewords.  This  construction  can  be 
used  for  all  possible  (d,  fc)-constraints. 

Introduction 

In  [1]  the  authors  describe  a  way  to  construct  peak-shift  error  cor¬ 
recting  variable  length  RLL  codes.  In  their  construction  they  use  a 
Hamming-metric  based  code  to  generate  (d,  fc)-constrained  codewords. 
However,  due  to  the  way  the  problem  is  stated,  it  is  more  obvious  that 
Lee-metric  based  codes  are  used  as  generating  codes.  This  was  also 
noticed  by  Roth  en  Siegel  in  [2].  The  main  idea  is  to  use  a  Lee-metric 
betsed  code  to  encode  the  runlengths  of  an  RLL  code.  This  allows  us 
not  only  to  correct  peak-shift  errors  but  also  Insertions  and  deletions 
of  zeroes. 

Preliminaries 

In  the  sequel  we  assume  that  an  error  has  some  maximal  size  of  s  bits. 
So,  in  case  of  a  shift  error,  a  one  is  shifted  over  at  most  s  positions, 
and  in  case  of  an  insertion  (resp.  deletion)  error,  at  most  s  zeroes  are 
inserted  (resp.  deleted).  Now  g  will  be  defined  as  2'S-(-l.  Furthermore, 
if  X  =  10“’  10“’ . . .  10“'  is  a  (d,  k)-constrained  word,  then  the  Integer 
Representation  (IR)  0  of  x  is  defined  by 

/3j  :=  (oi  -  d)  mod  g,  i=l,2,...,I.  (1) 

The  absolute  weight  W^,{0  )  of  a  g-ary  vector  0  of  length  I  is  defined 
as 

I 

W,,A0)-='Z>^i’  <2) 

i=l 

where  the  sum  is  taken  over  the  integers. 

In  the  sequel  C,  will  denote  a  g-ary  code  of  length  /  and  minimum 
Lee-distance  t,  and  Cj  will  denote  a  binary  (d,  k)-constrained  code  of 
length  iVj  :=  n  -t-  /  ■  (d  -(-  1)  for  some  n  >  0. 

If  t  =  2  •  r  •  s  -I-  1,  then  C,  is  capable  of  correcting  r  errors,  where  each 
error  has  size  at  most  s. 


The  Construction 

We  are  now  able  to  give  a  construction  for  the  code  Cj,  using  the  code 
C,. 

Construction: 


(b)  Find  all  vectors  -y  €  such  that 


I  .=1 

]  0<7i<r,  if  A  6  {0,1 . r,}, 

I  0<7i<ri-lif  A  6  {rz-l- l,r2-f  2,  ...,g- 1}. 

(4) 

(c)  Take  as  a  codeword  in  the  code  Cj  the  word 

(d)  Repeat  step  2(c)  for  all  7  of  step  2(b). 

3.  Repeat  step  2  for  all  0  of  step  1 . 

Decoding  is  now  done  by  taking  /  runs  at  a  time,  and  then  decoding 
the  IR  of  the  word  they  form  together.  Due  to  the  fact  that  we  have 
assumed  that  an  error  has  maximal  size  s,  and  the  alphabet  of  the 
code  C,  has  size  2  •  s  -t-  1,  it  is  tilways  possible  to  distinguish  between 
insertion  and  deletion  errors. 

In  general  not  all  codewords  0  of  the  code  C,  can  be  used  to  generate 
binary  (d,  I:)-constrained  codewords  of  length  Nj.  This  is  due  to  the 
fact  that  it  does  not  always  hold  that 

Wa4.(/3  )  =  n  (mod  g).  (6) 

This  can  be  solved  by  adding  a  parity  symbol  /Jj+j  to  a  g-ary  codeword 
0  ,  such  that 

l+i 

Y,0i  +  {l+i)-{d+l)sJV2  (modg),  (7) 

»=1 

or  equivalently 

A-n  =  1)- ))niodg.  (8) 

In  order  to  increase  the  number  of  codewords  of  the  code  Cj,  we  can 

use  more  g-ary  codes  C,.  say  C,,  for  i  =  1,2 _ ,p,  where  the  code  C’ 

has  length  U.  For  the  lengths  /;  it  most  hold  that 

(9) 

Furthermore,  if  we  assume  that  only  1  run  can  be  affected  by  errors, 
we  can  take  g  to  be  s  -h  1.  This  is  due  to  the  fact  that  the  code  Cj  has 
a  fixed  length  Nj. 

References 


1.  Take  (3  €  C,  with  W^,{0  )  =■  L  and  (n  -  L)  =  0  (mod  g). 

2.  Take  all  x  e  with  IR  0  ,  that  also  satisfy  the  (d,  k) 

constaints.  Do  this  as  follows: 

(a)  Define  rj  and  rj  such  that 


k  -  d  =  Pi  ■  q  +  Tj, 

_  _  I  k'^d  I 

ri  =  (t  -  d)  mod  g. 


(3) 
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Abstract.  We  construct  q— ary  block  codes  that  allow  correction 
of  specific  types  of  double  errors.  These  codes  can  be  used  as 
codes  for  correction  of  peak-shifts,  deletions  and  insertions  of 
zeros  in  id,k)-sequences  applied  in  magnetic  recording.  For 
single  peak-shifts  over  t  <  (k-d)/2  positions  left  or  right,  the 
codes  have  dimension  N=qr,  K=qf— (r+1),  q=k— d+1.  An 
additional  condition  on  the  structure  of  the  code  gives 
transparent  block  codes  which  are  used  to  control  the  maximum 
binary  length  of  the  code  words.  Encoding  and  decoding  are 
done  by  simple  algorithms  without  using  look-up  tables, 
enumeration  or  denumeration  procedures  and  therefore  the  code 
length  may  be  large.  The  rate  of  the  overall  encoding  approaches 
(21og^(k-d+l))/(k+d+2)  for  large  code  word  lengths. 

A.  CORRECTION  of  SINGLE  PEAK-SHIFTS 


and  Hy  for  N  =  7  (q=7,  r=l)  constructed  according  to  the 
described  procedure  are  given  below. 

120120120 

H3=  12112112  1,  ^7  =124042  r 

000120210 

Proposition  1.  The  linear  a-ary  code  defined  by  the  parity  check 
matrix  H  as  given  in  (4)-{5),  has  length  N  =  qr,  and  K  > 
N-{r+l)  information  symbols.  The  code  corrects  peak-shifts 
(1)^3)  of  size  t,  t  <  (k-d)/2  and  is  transparent. 

B  PEAK-SHIFTS  DELETIONS  and  INSERTIONS  of  ZEROS 

The  proposed  method  can  be  used  for  the  correction  of  other 
types  of  errors.  In  this  section  we  present  codes  that  can  correct 
in  the  (d,k)-sequence  a  single  distortion  of  the  following  type: 

al  a  peak-shift  on  (k— d)/2  or  less  positions; 
b)  a  deletion  or  an  insertion  of  (k-d)/2  or  less  zeros 
between  adjacent  one’s. 

The  list  of  possible  types  of  error  vectors  (1)— (3)  is  extended 
with  the  following 


For  the  transmission  through  (d,k)-constrained  channels,  q— ary 
code  words  (where  q=k-d+l)  are  converted  to  binary  sequences 
satisfying  the  (d,k)— constraint  by  replacing  q— ary  components 
by  binary  strings  of  i,  d<i<k,  consecutive  zeros  followed  by  a 
single  one.  If  at  most  a  single  peak— shift  of  value  t  occurs,  then 
the  output  code  word  of  the  encoder  x  e  C  and  the  input  word  z 
of  the  decoder  are  related  by  the  equation  z  =  x  +  g,  where  +  is 
the  componentwise  addition  of  integers,  and  g  =(ej^,e2,...,  Sn)  is 

an  error  vector  with  integer  components  e^,  that  belongs  to  one 
of  the  following  three  classes: 

1)  ej=0  for  1  <  i  <  N  (no  errors);  (1) 

2)  ej=0  for  1  <  i  <  N-1  and  ej^  i  0;  (2) 

3)  ej=t,  ej^j  =  -t  for  some  l<j<N-l,  e.=0  for  i#j,j+l.  (3) 


ej  =  t  for  some  1  <  j  <  N,  e^  =  0  for  1  <  i#j  <  N.  (6) 

In  fact,  errors  of  type  (2)  are  particular  cases  of  (6),  and  thus 
later  we  consider  only  three  types  (1),(3)  and  (6). 

Let  N  =  qr,  where  q  =  k— d+1  >  3  is  a  prime,  r  >  2  is  an 
arbitrary  integer,  and  let  7  be  a  primitive  element  of  GF^r) 
such  that  the  element  T*'S(1— 7)  is  not  an  integer  in  GF(qr).  For 
values  of  q  and  r  such  that  qf  <  128,  Tables  from  [2,  Chapter 
10]  can  be  used  to  select  a  primitive  dement  7  that  satisfies  this 
condition.  For  the  correction  of  errors  (l)-(3)  and  (6)  we  use  a 
q-ary  linear  code  C  defined  by  the  parity  check  matrix 

1  1  1  1  ...  1  1  11  ...1 

2  2  3  4  ...  q-1  q-1  0  1  ...q-1  (7) 

n  .vl  3  0-2  0-1  □  0  +  1  .„N-1 

0  7  7  7  .  ■  7  7  7  7  ...7 


We  have  related  the  problem  of  peak-shift  correction  to  the 
construction  of  block  codes  over  the  ring  of  integers  modulo  q 
correcting  double  errors  of  the  type  (3)  and  a  single  error  in  the 
last  component  of  the  code  word  (2). 

Let  N  =  q^,  where  q  =  k-d+1  >  3  is  a  prime,  and  r  >  1  is  an 
arbitrary  integer.  For  peak-shift  correction  we  use  a  q-ary 
linear  code  C  of  length  N  defined  by  the  parity  check  matrix  H 
=  ((hjj((  with  two  rows  of  following  elements  h^  j  e  GF(q)  = 

{0,l,...,q-l}  and  h„  •  e  GF(qr): 

^>J 

hl,j  =  j  mod  q,  1  <  j  <  N;  (4) 

^2,j+l  ~  ^2,j  '"^j’  f  ^  j  ^  N— 1,  (5) 


with  elements  hj.j,  h2,j  e  GF(q)  and  hj.j  €  GF(qrj.  The  code 
dimensions  are  N  =  qr,  K  =  q''-^r+2).  The  code  defined  by  this 
parity  check  matrix  is  transparent.  This  follows  from  the 
definitions  and  the  fact,  that  the  summation  of  all  elements  in 
GF(q')  gives  0  for  any  r>l.  As  an  example  for  q=3  and  r=2  the 

T  2 

parity  check  matrix  is  shown  below  (7  =(1,0)  r,  and  GF(3  )  is 


represented  as  in  Tables  of  (2)), 

I  1  1  I 

1 

1 

1  1 

2  2  2  0 

1 

2 

0  1 

H  =  0  12  2 

0 

2 

1  1 

0  0  12 

2 

0 

2  1 

Error  correction  uses  the  syndrome  S 


—  (5^.82,83) 


Tr 


H-z 


Tr 


where  Wj,w2,...,Wj,j_j  are  distinct  nonzero  transposed  r— tuples 

with  components  from  GF(q),  hj  j  is  the  transposed  r-tuple  T  = 

(1,0,. ..,0),  and  +  in  (5)  represents  componentwise  modulo  q 
addition  of  r-tuples. 

Transparency.  As  was  pointed  in  [1],  for  the  maximum  length 
control  the  code  C  must  be  transparent,  that  is,  the  all-ones 
word  1  =(1,1,. ..,1)  of  length  N  must  belong  to  the  code  C.  This 
condition  can  be  satisfied  in  several  ways.  For  example,  for  any 
prime  q  >  3  and  r  >  1  (except  the  case  q=3  and  r=l)  ,  as  an 
element  Wj  we  may  use  the  ordinary  q— ary  representation  of  its 

index  i  =  1,2,  ..,q''-l  considered  as  an  integer. 

Examples.  The  parity  check  matrices  Hj  for  N  =  9  (q=3,  r=2) 


Proposition  2.  The  linear  q— ary  code  defined  by  the  parity  check 
matrix  H  as  given  in  (7),  is  transparent,  has  length  N  =  qr,  and 
K  =  N— (r+2)  information  symbols.  The  code  corrects 
peak-shifts  of  size  t  (l)-(3)  and  t  insertions  and  deletions  of 
zeros  (6)  for  t  <  (k-d)/2. 
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In  this  talk  we  present  a  new  construction  of  codes 
with  zero  or  constrained  power  of  signal  at  zero  fre¬ 
quency,  of  codes  that  detect  synchronization  failures, 
and  of  a  family  of  sequences  with  low  periodic  correla¬ 
tion.  The  first  problem  has  been  recently  given  a  consid¬ 
erable  attention  (see  [1])  while  the  two  latter  problems 
have  been  studied  since  long  ago  [2], [3]. 

Let  A[k,n,M,(I\  denote  a  L-ary  code  of  length  n, 
with  M  words  and  Hamming  distance  d.  VVe  assume 
that  the  alphabet  letters  are  denoted  by  the  A'th  degree 
roots  of  unity. 

The  running  digital  sum  of  a  code  A  is,  by  definition, 

S{A)  ■—  rt  €  -4  — *  max  S{d). 

Following  [4],  let  us  call  a  family  {-4„}  of  codes  of  grow¬ 
ing  length  n  dc-consUatntd  if 

•5(>ln)  <  c(n), 

where  c(n)  =  o(n)  is  a  slowly  growing  function  in  n. 

The  caj'-icity  of  a  code  to  detect  synchronization  er¬ 
rors  is  determined  by  tiic  cede  si/paralion  [2],  For  it-ary 
vectors  a  —  (no,  «i, . .  ,  fin-i)  and  6  =  (60,  6i, . , . 
let  us  introduce  an  rt-vector 

Ti{a,b)  =  (ai,ai+i, .  •  • ,  a„_i, in,  61, . . . ,  6i_i ),  1  <  i  <  n- 

The  code  separation  is  defined  by 

p(A)  :=  a,b,c^A-^  1  <  *  <  n  —  1  — *  (7i(d,  6), c). 

Codes  with  p(A)  >  0  are  called  comma- fret . 

Finally,  the  periodic  correlatuin  of  two  complex  vec¬ 
tors  a,  6  is  defined  by; 

n—  1 

^a,6(^)  =  0<T<71-1. 

(=0 

Consider  the  following  code  construction  [4-5].  Let 
q  —  p"*,  where  p  is  an  odd  |)rime,  and  let  v(  )  be  a 
multiplicative  character  of  the  field  F,  of  order  k\(q  -  1). 
Consider  the  set  P  of  monic  polynomials  /(x),  1  <  /  < 
r,  that  satisfy  the  following  restriction:  in  the  expansion 
into  irreducibles  /  =  n.  gl'  all  f,  <  k  -  1.  Consider  a 
code  A  with  its  vectors  defined  by 

a<^^  =  x(/(A)),  I  <*  <9- I,/e  (I) 

where  (Bo,  ■  ■  ■  ,0q-\)  is  some  ordering  of  the  field  ele¬ 
ments. 


Theorem 

(i)  [5].  Let  k  >Z.  The  construction  (1)  defines  the  dc- 
constrained  code  A  with  the  parameters  [k,q,  M  ~ 
q’’  ,d  >  ((k  —  l)/k)(q  -  2r,yq)  -  2r]  and  S(^)  < 
srp’/^(l  -j-logp). 

(ii)  [5].  Let  q  be  an  odd  prime  and  k  >  3.  The  code 

A  defined  by  (1)  contains  a  comma-free  subcode 
A\[k,q,  M  ~  >  ((k  -  \)/k)(q  -  2r^)  -  2r\ 

with  p(A)  >  ((k  -  l)/k)(q  -  Ar,yq(l  -f  logq)). 

(iii)  For  any  two  cyclically  distinct  vectors  and 
defined  by  (1), 

Onu),n^>,)(T)  <  (2r  -  1)^9 

The  proof  utilizes  estimates  of  incomjilete  character 

sums  similar  to  the  Vinogradov  -Polya  ine(|ualiiy  [ti]. 
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In  emission  tomography,  useful  for  studying  brain  function,  a  source 
of  radioactivity  is  ingested  by  a  person,  say  as  sugar,  and  a  Poisson 
number,  n(6),  of  radioactive  emissions  arises  in  each  box  (pixel),  6, 
of  the  brain  depending  on  the  brain  activity  there,  and  A(6)  =  En(b) 
is  sought.  Each  emission  (not  directly  observable)  makes  an  indepen¬ 
dent  Markovian  transition  to  some  detector  unit,  d,  with  probability 
p(6,d),  where  p(4,  d)  is  known  from  the  geometry  and 

performance  of  the  detectors.  We  measure  n*(d),  the  total  number  of 
counts  in  each  d  and  wish  to  estimate  A(h)  to  get  an  image  of  the  brain 
activity,  say  during  counting,  speaking,  or  other  function. 

For  each  A  there  is  a  likelihood  (see  Appendix  A),  A(A),  to  observe 
n*  and  one  popular  approach  to  reconstructing  or  estimating  A  is  to 
seek  a  maximum  likelihood  estimator  (MLE).  Surprisingly  enough, 
ideas  of  information  theory  have  provided  useful  insight  into  the  theo¬ 
retical  understanding  of  MLE  even  though  entropy  doesn’t  appear  to 
be  directly  involved. 

Noone  knows  how  to  produce  an  MLE  directly  but  the  so-called 
EM  algorithm  is  used  beginning  with  an  initial  A°  to  produce  ever 
more  likely  A',  A* . . .  estimates. 

The  only  rigorous  proof  [1]  of  convergence  of  A"  to  a  limit  maximiz¬ 
ing  A(A)  is  heavily  information  theoretic.  Unfortunately  this  limiting 
MLE  was  seen  |2]  not  to  be  a  robust  estimate  —  due  to  the  fact  that 
r>(6)  is  small  and  hence  statistically  noisy  —  and  indeed  was  totally 
useless  as  a  practical  image.  If  MLE  were  not  unique  then  the  va.- 
ious  ML  estimators  could  be  averaged,  and  since  A(A)  is  seen  to  be 
log  concave  (see  Appendix  A),  an  estimate  could  be  obtained  which 
is  both  smooth  as  well  as  maximally  likely.  On  empirical  grounds 
it  was  conjectured  [2]  in  1988  that  MLE  was,  under  general  condi¬ 
tions,  unique.  Very  recently,  again  using  ideas  of  information  theory, 
Charles  L.  Byrne,  succeeded  [3]  to  formulate  a  general  and  natural  hy¬ 
pothesis  on  p{b,d)  under  which  the  conjecture  is  true.  This  dashes  all 
hope  that  smooth  MLE’s  exist  in  practical  emission  tomography.  The 
present  approaches  involve  either  stopping  the  iteration  early,  smooth¬ 
ing  at  each  step  or  at  the  end,  or  maximizing  posterior  likelihood  with 
a  Gibbs  prior. 

I  hope  information  theory  will  continue  to  shed  light  on  emission 
tomography. 


Appendix  A 

So  as  not  to  interrupt  the  main  ideas  this  paragraph  serves  to 
explain  to  the  uninitiated  reader  what  A(  A)  is  and  why  it  is  log  concave. 

Observe  that  n*(d)  are  independent  Poisson  variables  since  n(6) 
are  independent  and  Poisson  and  each  transition  from  each  b  to  some 
d  is  done  independently.  Thus  if  n(6,  d)  is  the  number  of  emissions 
in  6  that  become  counts  in  d  then  by  the  thinning  property  of  the 
Poisson  law,  n(b,d)  are  all  independent  Poisson  for  different  b’s  and 
for  different  d’s.  But  n'(d)  =  5^sn(4,  d)  and  so  »’(d)  are  also  Poisson 
and  independent. 

Thus  A(A)  =  nse-^‘<‘'>A-(d)"'<‘')/n*(d)!  where  A'(d)  =  En’(d)  = 
£526n(4,  d)  =  52s  A(6)p(6,d).  We  seek  an  MLE  whiA  maximizes 
A(A).  It  is  easy  to  see  from  this  formula  that  the  Hessian  of  iogA(A) 
is  negative  definite  and  so  A  is  log  concave.  For  more  details  see  [4]. 


References 

[1]  Csiszar,  I.  and  Tusnady,  G.  (1982),  Information  geometry  and 
alternating  minimization  procedures,  Math.  Inst.  Hungarian 
Academy  of  Sciences. 

[2]  Shepp,  L.  A.  and  Vanderbei,  R.  J.,  New  insights  into  emission 
tomography  via  linear  programming,  notes  prepared  for  Nato 
meeting  on  Formulation,  Handling,  and  Evaluation  of  Medical 
Images,  12-23  September  1988,  Portugal,  widely  distributed. 

[3j  Byrne,  Charles  L.,  Iterative  image  reconstruction  algorithms  based 
on  cross-entropy  minimization,  IEEE  Trans,  on  Inf.  Theory,  to 
appear. 

[4]  Vardi,  Y.  Shepp,  L.  A.,  and  Kaufman,  L.,  A  statistical  model 
for  position  emission  tomography,  J.  Amer.  Stat.  Assoc.  1985, 
vol.  80,  pp.  8-20. 


130 


Recursive  CR-Type  Bounds  and  the  EM  Algorithm: 
Applications  to  ECT  Image  Reconstruction^ 

A.O.  Hero*  and  J.A.  Fessler** 

*Dept.  of  Electrical  Engineering  and  Computer  Science  and  “Division  of  Nuclear  Medicine 
The  University  of  Michigan,  Ann  Arbor,  MI  48109-2122 


Abstract 

We  give  a  class  of  iterative  algorithms  to  monotonically  approx¬ 
imate  submatrices  of  the  CR  matrix  bound  on  the  covariance  of 
any  estimator  of  a  vector  parameter  S..  A  natural  implementation 
of  the  iterative  algorithm  employs  a  "complete  data  -  incomplete 
data  formulation  similar  to  that  underlying  the  EM  parameter 
estimation  algorithm.  Our  results  make  it  feasible  to  compute 
CR-type  bounds  for  previously  intractible  problems  involving  a 
large  number  of  “nuisance  parameters,"  such  as  arise  in  image 
reconstruction. 


I.  Summary 

The  Cramer-Rao  (CR)  bound  on  estimator  covariance  is  an  im¬ 
portant  tool  for  predicting  fundamental  limits  on  best  achievable 
parameter  estimation  performance  [5],  predicting  the  impact  of 
side  information  and  constraints  on  estimation  performance  (3), 
and  obtaining  optimal  experimental  designs  [1].  For  a  vector  pa¬ 
rameter  €  0  C  R"  the  upper  left  pxp  matrix  of  the  inverse  of 
the  n  X  n  Fisher  information  matrix  provides  the  CR  lower  bound 
on  the  minimum  achievable  covariance  of  any  unbiased  estimator 
o{9i,...,$^,p<n.  Equivalently,  the  first  p  rows  of  fy  *  provide 
the  CR  bound.  The  method  of  sequential  partitioning  [4]  for 
computing  the  upper  left  pxp  submatrix  of  '  and  Cholesky 
based  Gaussian  elimination  techniques  (2)  for  computing  the  p 
first  rows  of  are  efficient  direct  methods  for  obtaining  the 
CR  bound  but  require  O(n^)  floating  point  operations  and  0(n*) 
memory  storage.  Unfortunately,  in  many  practical  cases  of  inter- 
est,  e.g.  when  there  are  a  large  number  of  nuis^ulce  parameters, 
high  computation  and  memory  requirements  make  direct  imple¬ 
mentation  of  the  CR  bound  impractical.  For  example,  in  the 
case  of  image  reconstruction  for  a  256  x  256  pixelated  image  Fy 
is  256’  X  256’  so  that  direct  computation  of  the  CR  bound  on 
estimation  errors  in  a  small  region  of  the  image  requires  on  the 
order  of  256*  or  10”  floating  point  operations  and  on  the  order 
of  4GByte  of  memory  storage! 

In  this  paper  we  give  a  class  of  iterative  algorithms  for  comput¬ 
ing  columns  of  the  CR  bound  which  requires  only  O(n’)  floating 
point  operations  per  column  of  Fy*.  These  algorithms  fall  into 
the  class  of  “splitting  matrix  iterations”  (2).  The  inverse  of  this 
splitting  matrix  should  be  sparse  and  simply  determined.  The 
splitting  matrix  is  chosen  based  on  purely  algebraic  or  purely  sta¬ 
tistical  considerations  to  ensure  that  a  valid  lower  bound  results 
at  each  iteration  of  the  algorithm.  By  embedding  the  parameter 
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estimation  problem  into  a  particular  complete  data  -  incomplete 
data  setting,  and  applying  a  version  of  the  “data  processing  the¬ 
orem”  for  Fisher  matrices,  the  Fisher  matrix  Fx  for  the  com¬ 
plete  data  set  can  frequently  be  used  as  a  splitting  matrix.  This 
complete-incomplete  data  setting  is  similar  to  that  which  un¬ 
derlies  the  classical  formulation  of  the  EM  algorithm.  The  EM 
algorithm  generates  a  sequence  of  estimates  {9  for  £  which 
successively  increase  the  likelihood  function  and  converge  to  the 
maximum  likelihood  estimator.  In  a  similar  manner,  our  algo¬ 
rithm  generates  a  sequence  of  tighter  and  tighter  lower  bounds 
on  estimator  covariance  which  converge  to  the  actual  CR  matrix 
bound.  The  iterative  algorithm  allows  one  to  compute  the  CR 
bound  for  estimation  problems  that  would  have  been  intrutible 
by  exact  methods  due  to  the  large  dimension  of  Fy- 

We  conclude  with  an  implementation  of  the  recursive  algo¬ 
rithm  for  bounding  the  minimum  achievable  estimator  error  co- 
variance  for  problems  arising  in  emission  computed  tomography 
(ECT).  For  the  case  where  the  complete  data  is  selected  as  the 
set  of  image  pixel  emission  counts  in  each  of  d  “detector  tubes", 
which  is  the  standard  choice  of  complete  data  for  the  EM  image 
reconstruction  algorithm,  Fx  is  diagonal.  Furthermore,  due  to 
the  sparseness  of  the  tomographic  system  response  matrix  the 
computation  of  each  column  of  the  CR  bound  matrix  recursion 
only  requires  0(n)  memory  storage  as  compared  to  O(n’)  for  the 
general  algorithm.  We  show  that  in  general  the  rate  of  conver¬ 
gence  depends  on  the  image  intensity  and  the  tomographic  sys¬ 
tem  response  matrix.  We  have  applied  the  iterative  algorithm  to 
compute  the  CR  bound  for  practical  estimation  tasks  including; 
reconstruction  of  a  small  region-of-interest  (ROI),  estimation  (rf 
total  uptake  in  a  ROI,  estimation  of  dose  distribution  hetero¬ 
geneity  in  a  ROI,  impact  of  anatomical  side  information  on  ROI 
reconstruction. 
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Summary 

Atmospheric  turhiilenco  severely  limits  the  elFective 
resolution  of  a  loiig-iiilegratioii  image  olrtaiiu'd  In  an  iiii- 
rompensated.  ground-based  telescope.  Becau.se  of  this 
fact,  most  ground-based  telesco|)es  typically  collect  a  se 
quence  of  short -exposure  images.  The  simplest  and  most 
widely  u.sed  model  for  tlie  intensity  of  the  Tth  short  ex¬ 
posure  image-intensity  is 

'•<•(•)■)  =  Ac(.r)  *  .s(.r).  (1) 

where  .s(,r)  represents  the  light-ititensity  distribution  of 
the  object  being  viewed  and  Ih.{.v]  repres(>nts  tin'  point- 
spread  function  due  to  the  teh'scope's  finite  aperture  the 
turbulent  ttatuu  of  the  I'.arth's  atmospherr'.  The  poinl- 
•spread  functions  are  rommonly  modeled  as 

where  K\.  is  a  eonstatit  that  <lepends.  atnong  other  fac¬ 
tors,  on  the  duration  of  the  hh  data-collection  inter¬ 
val.  {F  denotes  the  Fourier  transform  operator. 
is  a  known,  binary  functioti  that  describes  the  tele- 
scoi>e's  aperture,  and  <I>c(u)  describes  the  turbidtuice- 
induced  pha.se-aberrations  that  occur  during  the  A-th 
data-collection  interval. 

In  all  real  situations,  the  intensities  {ci.(.i))  are 
not  detected  perfectly.  Instead,  they  are  corrupted  by 
some  type  of  noise.  F.xatnples  include  read-out  noise 
lor  charge-coupled-devices  (CCDs)  and  photon  noise  for 
photon-counting  cameras.  For  this  talk.  I  will  discuss 
the  situation  for  which  the  data  are  corrupted  by  photon 
noise.  In  this  case,  the  data  collected  in  the  Ath  frame 
are  denoted  as  (/jl.r)  and.  conditioned  on  the  object  in- 
ten.sily  .s(.r)  and  the  (joint-spread  function  /ic(.r).  d*-(j  ) 
is  Poisson-iJixjc  'ss  whose  intensity  is  rc(.r).  Further,  for 
k  j.  the  (Jioce.sses  dr(.r)  and  i/,(.r)  are  statistically 
independent . 


i  he  estimation  problem  I  address  is  then  one  of  es¬ 
timating  the  desired,  information-bearing  signal  .s(j). 
Irom  the  measured  data  {rAl-rDfL, .  When  the  atmo¬ 
spheric  phase-aberrations  are  known,  the  point- 

spread  functions  {/)/i(.r))  are  known  and  a  multi-framf 
(liioiirolulioii  ijroblfiii  must  be  solved.  When  the  at¬ 
mospheric  (jhase-aberrations  are  not  known,  the  proL 
hull  is  much  more  difficult.  In  this  case,  the  point- 
spread  fuiK  lions  are  not  known  and  a  multi-framf  blind- 
dfionrolution  grubitm  must  be  solved.  This  is  the  prob¬ 
lem  1  address. 

The  phase-aberrations  (<Ft(n)}  can  be  modeled  as  a 
collection  of  deterministic  functions  or  they  can  be  mod¬ 
eled  as  a  coller  tioti  of  randotn-|)rocesses  that  fluctuate 
randomly  with  k.  In  this  talk.  I  consider  the  first  situa¬ 
tion.  However,  when  sound  statistical  models  are  avail¬ 
able  for  the  |)hase-ab('rratioti  (jrocessps  they  should  be 
use<l.  1  he  estitnation  (jroblem  is  stated  as  one  of  forming 
the  muj'imum-likf lihood  f.<timal(s  of  the  information¬ 
bearing  signal  s(.r)  and  the  (jhase-aberrations  {4>)t(u)}. 
from  t  he  data  {(/(t(,r) }. 

in  the  talk,  a  numerical  technique  ba.sed  on  the 
(‘X|)<><  tation-maximization  (F,\l)  algorithm  will  be  pre¬ 
sented  for  fortning  solutions  numerically.  Examples  us¬ 
ing  both  siniidateil  data  and  real,  telescope  data  will 
also  be  |>resente<l  do  demonstrate  the  usefulness  of  the 
technique. 
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At  present,  there  arc  no  known  examples  of  planetary 
systems  other  than  our  solar  system,  in  which  the  orbits 
of  the  planets  all  lie  nearly  in  the  equatorial  plane  of  the 
sun.  It  is  conjectured  that,  soon  after  its  birth,  the  sun 
was  surrounded  by  a  disk  composed  of  dust  and  gas,  out 
of  which  the  planets  agglomerated,  the  residual  material 
being  blown  away  by  high  energy  winds  along  the  polar 
axis  of  the  sun.  In  fact,  astronomers  believe  that  many 
young  stars  are  surrounded  by  extended,  essentially  pla¬ 
nar  objects  composed  of  dust  and  gas,  called  “circumstel¬ 
lar  disks,”  and  that  these  are  the  environment  in  which 
planetary  systems  develop.  Apparently,  this  brief  episode 
of  stellar  evolution  is  part  of  a  broader  scenario,  only 
loosely  understood,  thought  to  begin  when  a  cold,  rotat¬ 
ing  protostellar  core  condenses  inside  a  large  molecular 
cloud  to  form  a  star-disk  system.  Eventually,  the  star 
enters  the  main  sequence  (i.e.,  hydrogen-burning)  stage, 
possibly  accompanied  by  a  planetary  complex  and  other 
disk  remnants. 

Aside  from  our  own  solar  system,  the  direct  opti¬ 
cal  evidence  for  the  existence  of  circumstellar  disks  is 
sparse.  An  extended  object,  thought  to  be  a  disk,  was 
observed  in  1984  around  0  Pictoris.  In  addition,  the  ”in- 
frared  excess”  observed  around  some  stars  is  thought  to 
be  starlight  absorbed  by  dust  particles  in  a  disk  and  re- 
radiated  at  longer  wavelengths,  resulting  in  significant 
energy  at  infrared  and  other  frequencies.  Finally,  there 
is  indirect  evidence  for  large  planets  derived  from  per¬ 
turbations  in  stellar  trajectories  and  velocities.  (Direct 
imaging  of  planets  is  beyond  current  technology.) 

This  talk  concerns  the  problem  of  detecting  circum¬ 
stellar  disks  based  on  Hubble  Space  Telescope  (HST)  ob¬ 
servations.  We  are  currently  analyzing  images  recently 
obtained  with  the  Wide  Field  Planetary  Camera  (WF/PC) 
of  several  nearby,  pre-main  sequence  stars,  both  single 
and  binary.  Despite  the  advantages  of  placing  a  telescope 
outside  the  earth’s  atmosphere,  the  images  taken  with 
the  WF/PC  are  still  considerably  degraded,  mainly  due 
to  the  severe  blurring  resulting  from  the  infamous  aber¬ 
ration  in  the  optical  system.  The  point  spread  function 
(PSF)  for  the  WF/PC  has  significant  mass  over  a  radius 
on  the  order  of  one  arc-.serond,  and  exposure  times  arc 
limited  by  its  instability.  In  addition,  there  arc  several 
other  factors  which  limit  the  amount  of  information  that 
is  readily  accessible  (e.g.,  visually  evident),  including  the 
usual  limitations  imposed  by  photon-limited  data,  stabil¬ 
ity  problems  with  the  spacecraft  (resulting  in  “trailing”), 
local  variations  in  the  point  spread  function,  variations 
in  detector  sensitivity  and  cosmic  ray  strikes. 


The  usual  approach  to  image  restoration  results  in  a 
single  “restored  image,”  deemed  to  capture  the  original 
brightness  pattern  without  degrading  effects,  or  at  least 
to  suppress  noise  and  enhance  resolution.  This  approach 
is  non-dedicated  and  nonparametric:  except  for  specific 
knowledge  of  the  image  formation  process,  it  incorpo¬ 
rates  only  generic  assumptions,  for  example  constraints 

on  the  positivity,  smoothness,  or  entropy  of  the  bright¬ 
ness  pattern.  Examples  of  such  techniques  include  those 
based  on  pseudo-  inverses,  maximum  entropy,  maximum 
likelihood,  and  Bayesian  inference  with  “prior”  and  “pos¬ 
terior”  distributions. 

In  contrast,  we  formulate  the  problem  of  the  exis¬ 
tence  of  disks  as  one  of  statistical  hypothesis  testing.  The 
probability  distribution  of  the  data  is  derived  by  model¬ 
ing  the  image  formation  process  using  the  semiclassical 
model  of  photodetection  (which  means  that  quantum  ef¬ 
fects  are  accounted  for  only  at  the  detection  end  of  the 
system)  as  well  as  other  important  factors  such  as  bias 
correction,  quantum  efficiency,  and  read-out  noise.  Ba¬ 
sically,  we  wish  to  test  the  hypothesis  Ho  '.star  alone  vs. 
H\  :  star  plus  something.  The  test  statistic  is  based  on 
the  (generalized)  likelihood  ratio.  This  is  less  straight¬ 
forward  than  it  might  appear.  For  one  thing,  nearby 
“calibration  stars”  provide  only  estimates  of  the  PSF, 
and  hence  there  is  a  random  factor  in  the  mean  bright¬ 
ness  pattern  which  has  to  be  “subtracted”  from  the  ob¬ 
served  one.  In  addition,  it  is  necessary  to  adjust  for  the 
difference  in  overall  brightness  between  the  calibration 
and  target  stars.  Finally,  it  is  also  necessary  to  account 
for  bias  correction,  variability  of  detector  sensitivity,  and 
electrical  noise.  Results  will  be  reported  on  at  least  one 
set  of  observations  of  several  young  stars  in  the  Taurus- 
Auriga  star  forming  complex. 
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Model-based  image  reconstruction  methods  such  as 
maximum-likelihood  (ML)  or  maximum  a-postcriori  (MAP) 
estimation  require  a  signal  model  describing  the  relationship 
between  the  image  parameters  to  be  estimated  and  the 
measurement  data.  Two  of  the  parameters  of  interest  in 
magnetic  resonance  imaging  (MRl)  arc  the  tissue  spin  density, 
A,  and  spin-spin  decay  time  constant,  Tj.  This  paper  presents 
a  mathematical  model  for  the  signals  observed  in  standard 
two-dimensional  MRI  experiments  and  discusses  how  this 
model  is  incorporated  into  a  maximum  a-postcriori  parameter 
estimation  algorithm  to  compute  image  estimates  of  spin 
density  and  spin-spin  decay  time.  A  detailed  description  of 
this  work  can  be  found  in  [1]. 

The  basic  response  of  a  magnetically  sensitive  population  of 
nuclei  to  excitation  in  a  magnetic  resonance  experiment  was 
described  by  Bloch  [2]  as  an  exponentially  decaying  sinusoid 
whose  frequency  is  proportional  to  the  strength  of  the  static 
magnetic  field  to  which  the  nuclei  arc  exposed.  In  an  MRI 
experiment,  magnetic  fields  which  vary  with  spatial  position 
are  employed  to  create  a  relationship  in  the  obserxed  signal 
between  the  frequency  and  pha.se  of  a  sinusoidal  signal 
component  and  the  spatial  position  from  which  the  signal 
originated. 

The  parameterized  signal  model  which  forms  the  basis  of  our 
MAP  image  reconstruction  algorithm  is  based  upon  three 
a.ssumptions:  I)  The  frequency  and  phase  encoding  magnetic 
fields  u.scd  for  spatial  localization  vary  linearly  with  spatial 
position.  2)  Voxels  of  dimension  x  Dycm'‘  arc  small  enough 
that  the  spin  density  and  spin  spin  decay  time  constant  within 
a  single  voxel  are  constant.  3)  The  loss  of  signal  coherence 
due  to  static  magnetic  field  inhomogcncity  results  in  signal 
attenuation  which  can  be  represented  as  an  exponential  decay 
with  time  constant  .  Under  these  assumptions  the  signal 
emitted  from  a  single  voxel  at  position  (.x,t  )  takes  the  form 

_  sin(ac^P,,r)  sin(7ic^O^T)  _  'IT,,  +  /„(,■),) 

nr/  nr/  r  r 

The  frequencies  of  oscillation  f,{x)  =  r,.v  and  f,{\  )  =  r^y  arc 
linear  functions  of  the  encoding  gradient  strengths  r,  ami  r, 
and  position  (xj/).  The  sinc-function  parameters  <■,/),  and 
CyDy  arc  equal  to  the  frequency  bandwidths  across  the  (r,i') 
dimensions  of  the  voxel.  The  full  two-dimensional  MRI  signal 
is  represented  as  a  superposition  of  sinc-modtilatcd, 
exponentially  decaying  sinu.soids  of  the  type  above,  one  from 
each  of  the  M  y.  N  voxels  which  form  the  image  field. 


Under  the  assumption  of  additive,  white,  Gaussian  noise  in 
the  MRI  measurement  data,  the  maximum  likelihood 
estimates  of  the  spin  density  and  spin-spin  decay  image 
parameters  are  those  parameters  which  minimize  the  squared 
error  between  the  measurement  data  and  a  signal  estimate 
computed  from  the  image  parameters  using  the  model 
described  above.  To  compute  these  image  parameter 

estimates,  we  have  implemented  a  form  of  the  iterative 
expectation  maximization  (EM)  algorithm  of  Dempster,  Laird, 
and  Rubin  [3].  This  algorithm  has  the  property  of 

decomposing  the  2  x  A/  x  A-dimcnsional  least-squares 
optimization  problem  stated  above  into  M  y  N  independent 
2-dimensional  minimizations  at  each  iteration,  allowing  for 
efficient  parallel  implementation  of  the  algorithm.  The 
algorithm  also  incorporates  a  Markov  random  field  prior 
constraining  the  roughnc.ss  of  the  computed  image  estimates, 
similar  to  the  technique  employed  by  Miller  and  Roysam  [4] 
for  emi.s.sion  tomography.  The  MRI  parameter  estimates 
produced  by  the  algorithm,  then,  arc  MAP  cstim.itcs  rather 
than  ML  estimates. 

Magnetic  resonance  image  reconstruction  is  typically 
performed  using  a  2-dimcnsional  Fourier  transform.  Under 
the  assumption  that  the  signal  emitted  from  a  single  voxel  is 
simply  a  non-decaying  sinusoid,  the  Fourier  transform  is 
exactly  the  maximum  likelihood  solution  and  further 
computation  is  unnecessary.  However,  the  more  detailed 
signal  model  stated  above  provides  additional  information 
about  the  behavior  of  the  MRI  signal  that  allows  for 
improved  image  estimates.  The  model  used  in  our  MAP 
algorithm  exploits  the  fact  that  the  sinc-modulatcd, 
exponentially  decaying  sinusoidal  signal  components  oscillate 
in  phase  with  one  another,  whereas  the  Gaussian  noise  is 
modelled  as  the  superposition  of  non-decaying  sinusoids  with 
random  amplitudes  and  uniformly  distributed,  random 
phases.  The  Fourier  transform  approach  models  signal  and 
noise  components  identically,  while  the  MAP  method  uses  the 
differences  between  signal  and  noise  to  reduce  the  sensitivity 
of  image  parameter  estimates  to  distortion  by  noise.  For  this 
reason,  the  MAP  algorithm  produces  image  parameter 
e.stimatcs  which  arc  of  higher  precision  (i.c.,  lower  variance) 
than  than  those  computed  using  Fourier  transform  based 
techniques. 
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Abstract 

We  consider  the  problem  of  restoring  images  corrupted  by  speckle 
noise  with  iid-exponential  statistics.  This  model  occurs  in  a  variety  of 
coherent  imaging  problems  including  diffuse-target  radar  imaging.  Our 
estimation  approach  is  obtained  by  statistical  inference  on  the  wavelet 
coefficients  of  the  logarithm  of  the  image.  Under  a  large-sample  approx¬ 
imation,  the  inference  problem  takes  a  very  simple  form.  Estimates  can 
be  obtained  under  various  noise/resolution  tradeoffs. 

1.  Diffuse  '*■* '  v-Target  Imaging 

Maximum-likelihood  (ML)  estimation  methods  have  recently  been 
explored  for  forming  images  of  diffuse  radar  targets  [1,2].  The  diffuse- 
target  model  assumes  independent  scatterers  and  models  the  reflectivity 
of  the  target  as  an  uncorrelated,  Gaussian  random  field.  This  model  has 
been  described  in  the  electromagnetics  literature  and  used  in  the  signal 
processing  literature. 

We  denote  by  c(x,y)  the  reflectivity  of  the  target  in  range  and 
cross-range  coordinates  and  we  define 

5(jt,y)  dx  dy  :=  E  [  lc(x,y)  dxdy\^  ].  S(x,y)  is  the  scattering  func¬ 
tion  of  the  target  and  is  the  desired  image  of  the  target.  The  received  data 
are  a  linear  transform  of  tne  reflectivity. 

It  is  known  that  ML  estimation  of  a  whole  function  such  as  5(x,y) 
from  finite  data  is  an  ill-posed  problem  and  requires  regularization.  A 
possible  solution  consists  in  representing  S(x,y)  in  terms  of  a  small 
number  of  basis  functions  and  estimating  their  coefficients  [2].  In  this 
paper,  we  present  a  regularization  method  based  on  a  wavelet  representa¬ 
tion  for  In5(x,y).  This  method  offers  the  ability  to  capture  significant 
components  of  ln.S(x,y)  at  different  resolution  levels.  This  capability  for 
multiresolution  estimation  allows  for  increased  flexibility  over  single- 
resolution  regularization  techniques  such  as  those  in  [2].  There  are  two 
essential  motivations  for  parameterizing  In5(x,y)  instead  of  5(x,y) 
itself.  The  first  is  the  need  to  preserve  positivity  of  scattering  function 
estimates.  The  second  is  that  in  the  log  domain,  the  estimation  problem 
can  be  set  up  as  the  problem  of  restoring  an  image  corrupted  by  additive 
non-Gaussian  noise.  By  application  of  statistical  hypothesis-testing  prin¬ 
ciples  a  solution  to  this  estimation  problem  can  be  derived  under  various 
noise/resolution  tradeoffs. 

2.  Statistical  Model 

Denote  by  N  the  number  of  data,  U  a  discretization  of  the  {x,y) 
domain  into  a  set  of  N  points,  L  the  linear  transform  (assumed  to  be  in¬ 
vertible)  that  maps  the  discretized  reflectivity  onto  the  data  r.  and  define 
pix.y)  :=  \(L~'  r){x,y)\^ .  In  the  radar  community,  p(x,y)  is  referred 
to  as  the  preimage  and  is  often  used  as  an  estimator  for  S{x.y).  The 
preimage  is  a  sufficient  statistic  for  the  ML  estimation  problem  and  is 
analogous  to  the  classical  periodogram  in  spectrum  estimation  [2].  It  un¬ 
fortunately  exhibits  poor  statistical  properties  that  follow  directly  from 
the  statistical  model  for  the  reflectivity, 

p(x,y)  =  S(x,y)  u(x,y)  ,  x,y  6  U  ,  (I) 

where  {u(x,y)  ,  x,y  €  Uj  are  iid  exponential  random  variables  with 
unit  mean.  In  the  image  processing  terminology,  u(x,y)  is  a  speckle 
noise.  Model  ( I )  fits  a  broad  class  of  coherent  imaging  problems,  as  well 
as  power-spectrum  estimation  problems.  We  use  a  logarithmic  transform 
to  map  the  multiplicative  model  (I)  into  an  additive  one. 

In  pix.y)  -  In2  -  y  =  In  S(x,y)  +  e(x,y)  ,  x.y  e  U  .  (2) 


In  (2),  y  =  0.57721  is  Euler’s  constant,  and 
(£(x,y)  :=  In  «(x,y)  -  ln2  -  y)  ate  zero-mean  iid  additive  noise  sam¬ 
ples  with  pdf  denoted  by  Pt(-)-  The  log-likelihood  for  5  is  given  by 

KS)  =  2  In  Pe  [•»  p(x,y)  -  ln2  -  y  -  In  5(A:,y)  ]  . 

x.y€U 

The  unconstrained  ML  estimator  is  simply  the  pteimage  and  is 
unsatisfactory.  In  the  following  we  show  how  wavelets  can  be  used  to 
achieve  regularization  of  the  estimates. 

3.  Wavelet  Regularization 

We  adopt  the  following  discrete  orthonoimal  wavelet  representa¬ 
tion  forln^fx.y); 

lnS(x,y)  =  2  ajuyfjuix.y)  ,  x,y  e  U  ,  (3) 

j.k,l  c  A 

where  {V;t;(x,y)};,*,/6  a  “  *  two-dimensional  wavelet  basis  fw  U,  j 
and  (k,/)  are  the  s^e  and  location  parameters  in  the  discrete  index  set 
A,  respectively,  and  {oytj}  are  the  wavelet  coefficients  for  ln5(x,y). 
Similarly,  we  introduce  the  wavelet  coefficients  [bju]  and  (e^u)  for  the 
scaled  log-preimage  and  for  the  additive  noise  e  in  (2),  respectively. 
From  (2),  we  obtain  bj^  =  +  eju  ,  j.k.ls  A,  in  which  [eju]  is 

inteipreted  as  an  additive  noise  corrupting  the  wavelet  coefficients  of 
lnS(x,y).  The  estimation  problem  consists  in  estimating  {aju)  given  the 
transformed  data  (sufficient  statistics)  [bju).  A  possible  scheme  is  pro¬ 
posed  in  [3]  and  outlined  below. 

Under  a  simple  technical  condition  on  the  wavelet  transform  used, 
a  good  large-sample  approximation  consists  in  assuming  that  {eju]  are 
iid  Gaussian  [3].  Then  the  log-likelihood  for  the  wavelet  coefficients  of 
the  scattering  function  can  be  maximized  over  each  wavelet  coefficient 
independently.  If  In5(x,y)  is  smooth  enough,  the  wavelet  coefficients 
i^ju)  decay  rapidly  at  fine  scales  [4].  This  behavior  is  to  be  contrasted 
with  that  of  the  wavelet  coefficients  for  the  noise  (eyu}.  which  have 
scale-independent  variance.  This  property  can  be  used  to  discriminate 
between  signal  and  noise  components  of  the  observations.  The 
significance  of  each  wavelet  coefficient  can  be  tested  by  application  of  a 
likelihood  ratio  test  (LRT).  By  application  of  this  classical  regression 
technique,  only  significant  wavelet  components  of  ln5(x,y).  regardless 
of  their  scale,  are  retained  in  the  regularized  wavelet  representation. 
Various  noise/resolution  tradeoffs  can  be  obtained  by  selecting  the 
significance  level  of  the  LRT  appropriately.  The  complexity  of  the  esti¬ 
mation  algorithm  is  linear  in  the  number  of  pixels  of  the  image. 
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1  Summary 

The  zero-mean  delta-correlated  complex  circular  Gaussian 
random  field  model  is  commonly  used  as  a  spatial  model 
for  the  joint  statistics  of  the  complex  amplitudes  of  the  pix¬ 
els  in  a  radar  or  coherent  optical  image  [1,  3  and  references 
therein].  Two  deficiencies  in  the  model  are  the  lack  of  the 
ability  to  model  heavy-tailed  distributions  and  spatial  corre¬ 
lation.  Tails  of  the  empirical  distributions  of  real  imagery  are 
often  heavier  than  those  predicted  by  the  Gaussian  model. 
Real  imagery  can  also  exhibit  spatial  correlation,  an  effect 
that  is  not  captured  by  the  classical  delta-correlated  Gaus¬ 
sian  model. 

We  introduce  a  generalization  of  the  Gaussian  speckle 
model  called  the  Markov  random  field  (MRF)  product  model. 
This  model  is  formed  as  the  pixel-by-pixel  product  between 
a  nonnegative  spatial  random  process  T,  called  the  texture 
process,  and  a  spatially-white  process  5,  called  the  speckle 
process.  An  essential  property  of  this  MRF  product  model 
is  that  it  admits  the  heavy  tails  and  spatial  correlation  seen 
in  real  imagery. 

We  propose  a  particular  MRF  product  model  in  which 
the  texture  process  T  is  represented  by  a  transformed  Gaus¬ 
sian  MRF  (TGMRF)  [4].  The  TGMRF  is  a  nonnegative 
MRF  generated  through  a  one-parameter  nonlinear  trans¬ 
formation  of  a  Gaussian  MRF.  We  then  discuss  parameter 
estimation  in  the  MRF  product  model  using  an  alternative 
criterion  based  upon  Csiszar’s  information  divergence. 


The  MRF  Product  Model 

Let  K  denote  a  set  of  pixel  indices.  We  will  denote 
the  radar  image  by  F,  where  Y_  =  {Ft;fc  €  K}  is  the 
lexicographically-ordered  vector  of  complex  pixel  amplitudes 
in  the  image.  The  model  we  propose  is;  Y  =  5*7*,  fc  €  K, 
where  5  is  a  spatially-white,  zero-mean  complex  circular 
gaussian  process  with  identity  covariance  and  T_  is  con¬ 
structed  as  follows. 

Define  the  power-law  transformation: 


d«f 


-  D/A 
log(y) 


y  >  0,  A  >  0 
y  >  0,  A  =  0 


(D 


Equation  (1)  is  typically  referred  to  as  the  Box-Cox  transfor¬ 
mation  [2].  The  term  I/A  is  included  to  ensure  the  continuity 
in  A  of  v\{y)  at  A  =  0. 

Suppose  that  there  exists  a  Gaussian  .MRF  2L  ~  V(/r,E) 
such  that 

A  =  t'q(I)~V(y,E),  (2) 

where  the  transform  r/i*  is  applied  on  a  pixel-by-pixel  basis. 
This  latter  relationship  then  specifies  a  spatial  model  for  T 
that  is  a  non-Gaussian  Markov  random  field  (but  includes 
Gaussian  model  as  a  special  case  when  A  =  1): 

(3) 


That  T  is  Markov  follows  by  noting  that  t/i*  is  a  continuous 
one-to-one  transformation  from  R^.  to  R  and  that  is  a 
MRF. 

Parameters  of  the  product  model  to  be  estimated  are  the 
mean  and  covariance  of  the  Gaussicin  MRF,  pt  and  S,  and 
the  Box-Cox  parameter  A.  We  will  denote  these  parameters 
by  9. 

Parameter  Estimation  Approach 

Let  f5(/s||/s|y)  be  the  information  divergence  between 
/s  and  /^y.  Application  of  this  criterion  for  parameter  es¬ 
timation  in  the  MRF  product  model  is  based  upon  the  fol¬ 
lowing  heuristic.  One  can  view  the  speckle  process  £  as  a 
“noise”  process  eind  /gy  (s|j[;  0)  as  an  estimate  of  the  pdf  of 
this  noise  process.  Minimization  of  the  divergence  then  is  in 
some  sense  equivalent  to  choosing  the  pairameter  £  for  which 
the  residual  speckle  term,  2is  predicted  by  /^y,  is  “white”, 
e.g.,  is  a  best  match  our  for  our  iid  speckle  model,  /s- 

Maximization  of  the  following  criterion: 

log /y(y; i)-D  (4(s)l|/g,y(s||,; £))  (4) 

results  in  an  alternative  to  the  maximum-likelihood  estima¬ 
tor  (MLE)  that  is  computationedly  simpler  to  evaluate  than 
the  MLE.  This  alternate  criterion  results  in  an  estimator  9 
that  simultaneously  maximizes  the  likelihood  function  and 
minimizes  the  information  divergence  between  fs  and  f^y. 
We  investigate  properties  of  this  estimator  both  theoretically 
and  via  simulation. 
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The  output  of  a  finite  state  machine  is  a  collection  of  codewords  apparent  in  the  work  of  Conway  and  Sloane.  The  extension  of  this 

that  can  be  searched  efficiently  to  find  the  optimum  codeword  with  re-  correspondence  to  sequence  based  methods  of  quantization  and  shap- 

spect  to  any  nonnegative  measure  that  can  be  calculated  on  a  symbol  ing  has  been  described  by  Forney. 

by  symbol  basis.  One  recent  application  of  this  principle  is  the  trellis  We  calculate  the  per-dimension  mean  squared  error  /r(5)  of  the 

coded  quantization  work  of  Marcellin  and  Fischer  where  the  measure  2-state  convolutional  code  C  with  generator  matrix  [1, 1  -1-  D],  for  the 

is  mean  squared  error  (m.s.e.).  A  second  application,  closely  related  symmetric  binary  source  S  =  {0,1},  and  for  the  uniform  source  S  = 

to  the  latter,  is  the  trellis  shaping  work  of  Forney.  Trellis  shaping  [0.1].  When  S  =  {0,1},  the  quantity  /i(S)  is  the  second  moment 

is  a  sequence  based  technique  for  decreasing  the  average  transmitted  of  the  coset  weight  distribution,  which  gives  the  expected  Hamming 

signal  power  in  a  communications  system,  and  in  this  application,  the  distance  of  a  random  binary  sequence  from  the  code.  When  S  =  [0, 1], 

measure  is  the  power  of  an  individual  signal  point.  Both  applications  the  quantity  /r(5)  is  the  second  moment  of  the  Voronoi  region  of  the 

involve  representing  a  source  sequence  i  as  the  sum  of  a  codeword  c  modulo  2  binary  lattice  determined  by  C.  The  key  observation  is  that 

and  an  error  sequence  e  =  (ti).  In  quantization,  the  objective  is  the  a  convolutional  code  with  2"  states  gives  2''  approximations  to  a  given 

codeword  c,  and  the  expected  value  E(ef)  is  the  mean  squared  error  source  sequence,  and  these  approximations  do  not  differ  very  much.  It 

(per  dimension).  In  trellis  shaping  the  objective  is  the  error  sequence  e.  is  possible  to  calculate  the  steady  state  distribution  for  the  differences 

The  signal  constellation  will  be  the  error  sequences  e  that  result  from  a  in  these  path  metrics,  and  hence  the  second  moment.  In  this  paper 

suitably  chosen  discrete  set  ot  source  sequences  x.  Here  the  expected  we  shall  only  give  details  for  the  convolutional  code  [l,  1  -f  D],  but  the 

value  £’(«?)  will  determine  the  extent  to  which  average  transmitted  method  applies  to  arbitrary  codes. 

signal  power  is  reduced.  This  correspondence  between  vector  quan-  We  also  define  the  covering  radius  of  a  convolutional  code,  and 

tization  and  the  design  of  finite  dimensional  signal  constellations  is  calculate  this  quantity  for  the  code  [1,1-1-  D]. 
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AbstractAn  our  previous  study,  it  was  shown  that  good  high-rate  punctured 
convolutional  codes(PCCs)  in  the  class  H/  (E/-PCC)  can  be  systematically 
searched.  Leting  Ps  =  {Po.Pi.-  ■  -.Pn-i}  b«  a  set  of  n  different  generators  for 
a  E/-PCC.  we  construct  a  rate-variable  PCC  by  using  only  the  generators  in 
Pv.  For  constraint  lengths  7,  6,  and  9,  we  have  found  new  good  k/(k  +  1) 
rate-variable  PCC  systems  that  provide  good  BER  performance  for  k  =1.2,-  •  -,7. 

Introduction 

Punctured  Convolutional  Code(PCC)[3]  is  a  class  of  high-rate 
convolutional  codes  obtained  by  periodically  puncturing  the  outputs 
of  a  low-rate  encoder.  The  Viterbi  decoder  of  a  PCC  is  much  more 
simple  than  that  of  the  usual  liigh-rate  codes.  However,  because  of 
the  lack  of  mathematical  structures  good  high-rate  PCCs  could  not 
be  efficiently  searched  by  a  .systematical  algorithm.  This  problem 
was  solved  by  introduction  of  H/[l).  The  punctured  convolutional 
encoder  for  a  E/  code,  E/-PCC,  can  be  obtained  by  a  systematic 
search  algorithm. 

The  conventional  method  to  constnict  a  rate-variable  PCCs  is  to 
search  good  PCCs  for  different  coding  rates  restricting  n  generators 
to  those  for  an  optimal  rate  1/n  code.  In  this  paper,  rather  than 
using  the  generators  of  a  low-rate  code,  we  construct  rate-variable 
PCC  by  using  different  generators  for  a  good  liigh-rate  S /-PCC. 

Background 

Yamada  ct  al.[2]  proposed  a  maximum  likelihood  decoding  tech¬ 
nique,  called  the  YHM  algorithm.  This  is  a  breakthrough  over  the 
inherent  difficulties  in  decoding  of  any  liigh-rate  convolutional  code. 
The  idea  of  YHM  algorithm  is  to  divide  the  trellis  diagram  of  the 
syndrome-former  of  a  rate  k/{k+l)  code  into  A:  -fl  stages  such  that 
there  are  only  two  branches  or  less  entering  eacli  state. 

E  is  defined  in  [1]  as  a  class  of  {k  +  l,k.i/)  convolutional  codes 
having  <2.  where  t/v  is  the  number  of  polynomials  HHD)  having 
de$[H^D)]  =  1/  in  the  parity-check  HiD).  E  codes  can  be  efficiently 
decoded  by  YHM  algorithm.  In  general,  however,  the  trelhs  of  a  S 
code  has  time-varying  branch  structure.  E/  is  a  particular  class  of 
E  codes  that  can  be  decoded  by  the  fixed  branch  stnicture  trellis. 
Since  E  is  defined  by  the  parity-check  matrix,  it  can  be  efficiently 
constructed  by  a  systematic  algorithm.  Several  good  high-rate  E  and 
S /  codes  of  dfret  =  4. 5.  ■  ■  • ,  8  have  been  reported  in  [I]. 

In  (1).  it  is  pointed  out  that  the  trellis  in  YHM  algorithm  of 
h  {k  +  i.k.j/)  E/  code  is  exactly  the  same  as  the  trellis  in  Viterbi 
algorithm  for  a  (fc-l- 1,  k.  u)  PCC.  Hence,  the  punctured  convolutional 
encoder  for  S /  can  be  derived  from  the  trellis  of  a  S/  code.  The  PCCs 
obtained  by  this  method  is  called  E/-PCC. 

Code  Search  Results 

Leting  Pz  =  {Po.Pi.- ■ -.Pn-i}  be  a  .set  of  n  different  generators 
for  a  E/-PCC,  we  construct  a  rate- variable  PCC  by  asing  only  the 
generators  in  Pz. 

Limited  searches  are  conducted  on  the  set  of  codes  having  con¬ 
straint  lengths  u  =  7.8  and  9  to  constnict  k/(k  +  1)  rate-variable 
PCC  that  give  good  BER  performance  for  1  <  k  <7.  For  i/  =  7  and 
u  =  8  partial  .searches  were  performed  in  the  set  of  rate  6/7  E/-PCCs 
achieving  d/rrr  =  5  and  djrrr  =  6,  respectively.  At  i/  =  9  a  par¬ 
tial  search  was  perfomied  in  the  set  of  rate  3/4  E/-PCCs  achieving 
d fr„  =8.  d fr,,  and  the  first  five  tenns  of  weight  spectra  ft;  of  the  ob¬ 
tained  rate- variable  PCCs  systems  arc  listed  in  Table  1.  To  compare, 
the  parameters  of  the  best  known  rate-vai'iable  PCC  systems  (3](4) 
are  also  listed  in  Table  1.  The  newly  obtained  rate-variable  coding 


systems  give  moderate  BER  performance  at  low-rate.  At  high-rate, 
these  new  systems  give  significantly  better  BER  performance  than 
that  of  the  best  known  rate-variable  systems. 


Table  PRate-Variavle  PCC  systems 


Constraint  Length 

17  =  7 

Rate 

code 

1/2 

(P).p«)* 

10 

10,0,73.0.687 

IPo.Pl)^ 

10 

2.23,62,165.404 

2/3 

(Pl.p«).« 

7 

45,^06.891.4076,18052 

(P>.PI).P> 

8 

395,0,6695.0.135288 

3/4 

(Pr.Pcl.Pl.ft 

6 

1 00,585.3839,24570, 155815 

(Pj.TM.Pi.P?- 

6 

67,651 .4008,24638.153642 

4/.5 

(Pl.Prf.Fi.Pi.p? 

6 

1899,0,130944.0.8065820 

(Po.Pi).Pi.Pi.Pf 

5 

93,873,7017,59170.482219 

5/6 

(Po.Pi  ).P.. Pi. Pl.« 

5 

329,383438819.385064.3716879 

(Pi.Pil.Pi.ft.Pl.Pf 

5 

366,4287.4436.423337.4009089 

6/7 

(Pl.P«).P).7ft.Pl,fi.Pr- 

5 

723.10310,123861,1459133.16002878 

(Pl.Pil.Pl.Pl./^.Pl.Pf 

4 

17.1008,12651,152171.1780906 

7/8 

in>.P4).Pr.P>.Pi,ft.Pi.Pf 

4 

39,1863,31388.444036.5963198 

(fl>.Pi).Pi.Pi,ft,ft.P).P? 

4 

77.2122,32024.455479.6099937 

Conatraint  Length 

i/  =  8 

Rate 

code 

‘^/ree 

1/2 

(Pi.Pi)‘ 

12 

67.0,472,0,3363 

(Po.Pi)'- 

11 

3.26.53,150.379 

2/3 

(Pl.Ps).pf 

8 

128.0.3490.0.  62693 

(Pl.pll.pf 

8 

109.0,2966,0,56458 

3/4 

(Pb.P.I.Pj./? 

6 

27,610.3196.17838.110761 

(Pi.PiLPi.Pf 

6 

52.490,2902.17935,109020 

4/5 

(Pi.A).P>.P..F{ 

6 

74^.0.61866,0,3838837 

(Pb.Pil.Pr.Pb.P? 

5 

12.196, 2dI3,208'd, 169543 

“5/6 

(Pb.P.l.Pi.Po.Pi./? 

6 

itiO.0,312103.0.28775304 

(fli.Prf.Pj.Pi.Pi.P?’ 

5 

152,1688,18092.182519.17:6286 

6/7 

(a.AkPr.Pi.Pi.Pi.p/- 

6 

9*i8,0, 1 4371 37,0. 1 83683646 

(P).P)).P>.Pi.ft,Pi.P? 

4 

4.462.6229.73501.879798 

7/8 

(P>.P>).P..Pl.A.ft.P6.Pj 

h 

93 1 , 1 4309.20051 2.274960 1  .:«97-l  1 05 

(Po.Pil.Pr.Pi.ft.Pi.Pi.Pf 

4 

1 3. 1 572.23200.3 1 7333.4268249 

Constraint  Length 

17^9 

Rate 

code 

6/rt. 

1/2 

iPt.p))’ 

12 

•20.0.170.0.1116 

(Pb.Pi)" 

12 

14.26.74,257.496 

2/3 

(P).P().pJ 

8 

54,0.1720,0,30595 

(p,.Pi).p» 

7 

3,70.207.836.4411 

3/4 

(Pi.Pii.ft.p?- 

8 

2118.0, 78546,0.-2915853 

(Pi.Pil.Pr.P?' 

6 

38.270,1640,10554.63601 

4/5 

(P).P,).P,./^.PJ 

6 

291,0.25235,0.16118071 

(fl,.Pi).P,.P|.P» 

4 

6.3,298,2604.19132 

5/6 

(P..P>).P).P,.P.Pj> 

6 

21 80.0.226 1 05.0.20626620 

(P,.Pi).P,,P,,P>,pw 

5 

201,2104,22183,217194.-204  U91 

6/7 

(Pi.Pii.Pr.Pi.P.ft.P,? 

5 

94.4027.36019.51 1214,5033081 

(Po.PiLPo.Po./^.Pi.Pr 

5 

267 .5 1 05,56285.656627.74:1387 1 

7/S 

(Pj.  Pol.  Po.  Po.Pi. Pi, Pj.p? 

5 

662.10320,150932,2033984.2651259r 

(Pi.Prl.Pi.Pi.Pi.Pi.Pb.P?' 

4 

3.690,10528.150237 .2007749 

cod*  lelKteil  from  Ph  =  f  Fi.Pi  .P3,Fj.P,,p>,P(}=1337.227.221  .207.215.327.255). 
f4«1.5«3.537.575.673.671.6n) 

7:B«»I  cod*  Kl*ct*d  from  P=  =  fP].Pi.P,.Pi)={H«7.1317.1037.1725  ) 

*^:B*«  cod*  i«<*cl*d  from  cod*  |*fieritoi  Ktt  {Pj.Pi ,P!.Pi}=(337.25l,237.235}.  {765 
473.463.4$7}.fo<itid  In  |4|. 

ff:8*»t  cod*  »«l*ct*d  from  cod*  g*n*rstor  s«t  {Pr.Pi  }={1167.1545}  found  in[2] 

•:i/*PCC. 
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Abstract 

A  new  formula  is  derived  to  compute  the  codeword  weights 
of  rate  1  /n  convolutional  codes.  This  formula  allows  us  to  derive 
a  new  asymptotic  lower  bound  on  the  row  distance  that  surpris¬ 
ingly  reaches  the  upper  bound  on  free  distance  in  the  limit  of 
large  memory  m.  This  formula  also  leads  to  a  new  approach  for 
constructing  finite  constraint  length  convolutional  codes. 

Summary 

Costello  [1]  and  Zigangirov  and  Massey  (2)  have  derived  lower 
and  upper  bounds  on  the  free  distance  of  fixed  and  time-varying 
convolutional  codes  by  deriving  bounds  on  ensemble  averages. 
In  this  paper,  we  investigate  a  new  approach  to  deriving  an 
asymptotic  lower  bound  on  the  row  distance  of  order  /  of  rate 
1/n  convolutional  codes,  i.e.,  the  lowest  weight  of  codewords 
generated  by  information  sequences  of  length  less  than  or  equal 
to  /  -f  1.  Although  this  bound  is  valid  only  asymptotically,  it 
suggests  that  a  similar  bound  might  also  be  found  for  finite  con¬ 
straint  lengths,  and  it  leads  to  a  new  approach  for  constructing 
finite  constraint  length  convolutional  codes. 

Suppose  ix,y)  are  two  elements  from  the  binary  field  F  = 
{0,1},  ®  denotes  addition  in  the  binary  field,  and  -t-  denotes 
addition  in  the  integer  field  1.  Then 

x®y  =  x  +  y-  2xy.  ( 1 ) 

In  order  to  derive  our  new  formula  on  the  row  distance  of 
rate  l/n  convolutional  codes,  we  need  the  following  definitions: 

Definition  1.  Let  a(Jf )  be  a  polynomial  with  coefficients  a,. 
Then,  we  define  the  k‘^  correlation  coefficients  as 

!=<» 

jl)  ■  ■  ■  !  i*)  =  ■  “i+Jl+ ..+>»t  (2) 

1=0 

where  ji,  jj, . . .  ,jk  are  k  integers  strictly  greater  than  0. 

Definition  2.  Let  be  the  n  degree 

m  generator  polynomials  of  a  rate  l/n  convolutional  code  C. 
Then,  let  G(X)  be  the  composite  generator  polynomial 

G(X)  =  +  Xg^^^iX")  +  ...  +  (3) 

Then,  for  any  information  sequence  u(,Y),  the  code  sequence 
v(X)  is  generated  by 

v(A')  =  u(.Y")G(.V).  (4) 

Using  the  previous  definitions,  we  can  obtain  the  following 
theorem  on  computing  the  row  distance  of  rate  1  / n  convolutional 
codes. 

Theorem  1.  Let  C  be  a  rate  l/n  convolutional  code  with  com¬ 
posite  generator  polynomial  G(.Y).  Then  the  row  distance  of 
order  I  of  C  can  be  computed  as: 


'This  work  was  supported  by  NSF  Grant  NCR89-03429  and  by  NASA 
Grant  NAG5-557. 


di  =  min  (fiouf^oG  -  2Ejio°  ^iu(i)^iG{nj) 

degu(X)<i+l 

+4  Eyi=o  k)R2G(nk,  nj)  -  . .  .'j  (5) 

(5)  gives  a  general  formula  for  computing  the  row  distance 
of  order  I  of  a  rate  1  /n  convolutional  code.  This  formula  can  be 
simplified  when  m  goes  to  infinity.  Specifically,  let  us  construct 
our  generator  polynomial  by  randomly  selecting  its  coefficients 
from  F,  that  is:  )-i 

G{X)  =  Y1  (6) 

1=0 

where  gi  €  F  =  {0,1}  and  Pr(gi  =  0)  =  Fr(j,  =  1)  =  j  for 
any  integer  t  >  0.  For  these  randomly  constructed  codes,  the 
following  theorem  can  be  derived: 

Theorem  2.  Let  G(X)  of  degree  n{m  -(-  1)  —  1  be  the  com¬ 
posite  generator  polynomial  of  a  randomly  constructed  rate  l/n 
convolutional  code  with  memory  order  m.  Then,  for  any  finite 
order  I,  with  probability  1, 

lim  =  L  (7) 

"•—to  n(m  -f  1)  2 

Thus,  by  taking  a  random  generator  G(X),  with  probabil¬ 
ity  1  the  row  distance  of  any  finite  order  I  is  on  the  order  of 
n(m  +  1  )/2  as  m  goes  to  infinity.  Since  there  exists  a  large  num¬ 
ber  of  randomly  generated  codes,  (7)  represents  a  lower  bound 
on  the  row  distance  of  rate  1  /n  convolutional  codes.  This  bound 
is  significant  since  it  implies  that  there  exists  codes  for  which 
the  row  distance  of  any  finite  order  I  reaches  the  aisymptotic 
upper-bound  on  free  distance  derived  by  Costello  [1].  Since  the 
bound  is  only  valid  for  /  <  m,  however,  it  is  not  a  bound  on 
d/r«  =  /imi_ood|.  But,  for  almost  all  known  codes,  the  row  dis¬ 
tance  reaches  the  free  distance  within  the  first  constraint  length, 
suggesting  that  it  may  be  possible  to  strengthen  existing  lower 
bounds  on  the  free  distance. 

Although  the  bound  derived  in  Theorem  2  is  only  valid  as  m 
goes  to  infinity,  it  is  possible  to  use  Theorem  1  for  constructing 
rate  l/n  convolutional  codes.  We  note  that  a  code  with  large 
free  distance  requires  the  optimization  of  the  functional  given 
by  (5).  Thus,  different  algorithms  for  seeking  optimization  of  a 
functional  can  be  used  to  construct  generators  of  rate  l/n  con¬ 
volutional  codes.  A  large  number  of  convolutional  codes  con¬ 
structed  in  this  way  have  free  distances  close  to  optimal  codes, 
and  the  algorithms  allow  us  to  construct  codes  with  much  higher 
constraint  lengths  than  previously  constructed  codes. 
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Abstract 

Well  known  techniques  from  linear  block  codes  analysis  are  used 
to  study  convolutional  codes.  It  is  demonstrated  how  these  methods 
can  be  used  to  determine  bounds  on  convolutional  codes  of  certain 
parameters. 

Introduction 

An  (n,  k,  v,  df,„)  code  is  a  convolutional  code  of  block  length  r, 
dimension  k,  constraint  length  u,  and  free  distance  df^.  The  code 
can  be  generated  by  a  k  x  n  matrix  G(D)  (an  encoder  matrix),  in 
which  each  entry  is  a  polynomial  in  D.  The  ith  constraint  length  vi 
is  the  maximum  degree  of  a  polynomial  of  the  tth  row  of  G{D),  and 
f  ‘'i-  Without  loss  of  generality  we  can  assume  ui  <  ui+i 

for  1  <  2  <  It.  We  call  the  vector  ,vk)  the  constraint  length 

vector. 

In  [1],  N{r,v,d{,„)  is  defined  as  the  largest  n  such  that  an 
(n,n  —  r,v,d(,tt)  code  exists,  and  bounds  on  this  function  are  de¬ 
veloped. 

A  particularly  useful  upper  bound  on  A'(r,  originally 

due  to  Heller  (2],  arises  from  the  fact  that  a  set  of  convolutional  code 
words  of  bounded  length  is  a  linear  block  code,  called  a  terminated 
code  (3). 

Lemma  (12]).  Let  C  be  an  (n,k,v,d)  convolutional  code  with  con¬ 
straint  length  vector  {t/\,...,uk).  Then  for  all  j  >  \,  Ty  is  a 
\jn,  K{j),d\  linear  block  code,  where  K(j)  —  {j  -  Ui). 

i.i>v. 

In  many  cases  the  methods  in  (1]  are  insufficient  to  determine 
N{r,  u,  dff„)  exaaly.  The  purpose  of  this  talk  is  to  demonstrate  that 
a  detailed  analysis  of  the  terminated  block  codes  sometimes  provides 
either  proofs  of  nonexistence  of,  or  suggestions  on  how  to  construct, 
the  associated  convolutional  codes.  This  analysis  employs  standard 
techniques  from  block  code  analysis. 

An  Example 

In  [  1  ],  an  (8,6,5 ,6)  code  was  obtained  by  computer  search.  Below 
follows  a  proof  for  the  nonexistence  of  (9,74,6)  codes. 

Lemma  1.  Sv^pose  T  is  an  [18,9,6]  linear  block  code.  Then  the 
weight  distribution  of  T  is 

An  —  Air  =  l,Afi  =  Au  =  102,  Ar  =  Am  =  153.  (I) 

Proof.  Using  well-known  techniques,  it  is  possible  to  show  that  the 
minimum  distance  of  T-*-  =  6,  and  that  no  code  word  in  T  has  weight 
7.  Then  (1)  is  the  only  positive  integer  solution  to  the  Mac  Williams 
identities. 

□ 

Theorem  2.  There  is  no  (9,74,6)  convolutional  code. 

Proof.  Suppose  C  is  a  (9,7 ,5 ,6)  convolutional  code,  and  that  G(D) 
is  a  minimal  encoder  for  C.  We  can  always  assume  that  the  con¬ 
straint  length  vector  is  ordered.  Hence,  the  only  possible  constraint 


length  vector  is  (0,0, 1, 1, 1, 1, 1),  otherwise  the  parameters  of  Ti  are 
[9,  k(>  3),  6],  but  this  is  impossible  by  the  Griesmer  bound.  This  im¬ 
plies  that  Ti  is  a  [9,2,6]  code,  and  it  is  easy  to  show  that  the  upper 
left  comer  2x9  submatrix  of  the  binary  encoder  G(D)  is  equivalent 
(except  for  column  permutations)  to 

1  1  1  1  1  1  0  0  0\ 

1  1  1  0  0  0  1  1  ly’ 

thus  Ti  does  not  contain  the  ^1-one  code  word.  Further,  72  is  an 
[18,9,6]  code.  From  Lemma  1,  it  follows  that  contains  the 
all-one  code  word,  so  a  sequence  of  suitable  row  operations  will 
transform  G(D)  into  another  encoder  G*(£>)  which  (i)  has  the  satire 
or  smaller  constraint  length  as  G(Z>),  and  (ii)  contains  a  row  on  the 
form  (1  D)(l, 1,1, 1,1, 1,1, 1,1).  (i)  implies  that  G‘(D)  is  also 

a  minimal  encoder.  However,  (ii)  contradicts  this  (see,  for  instance, 
14)). 

□ 

In  other  cases,  similar  methods  have  made  it  possible  to  aauaUy 
construct  convolutional  encoders  of  certain  parameters.  Thus,  for 
instance,  it  is  possible 

•  to  construct  a  (9,5,4,8)  code  through  analysis  of  118,6,8]  linear 
block  codes,  and 

•  to  construct  a  (6,2,6,16)  code  through  analysis  of  130,4,16]  liirear 
block  codes. 

These  results  determine  N(r,  v,  df^)  exactly  in  the  respective  cases. 
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This  talk  describes  methods  for  analyzing  the  expected  and  worst- 
case  performance  of  sequence  based  methods  of  quantization.  We  sup¬ 
pose  that  the  quantization  algorithm  is  dynamic  programming,  where 
the  current  step  depends  on  a  vector  of  path  metrics,  which  we  call 
a  metric  function.  Our  principal  objective  is  a  concise  representation 
of  these  metric  functions  and  the  possible  trajectories  of  the  dynamic 
programming  algorithm. 

We  shadl  consider  quantization  of  equiprobable  binary  data  using  a 
convolutional  code.  Here  the  additive  group  of  the  code  splits  the  set 
of  metric  functions  into  a  finite  collection  of  subsets.  The  subsets  form 
the  vertices  of  a  directed  graph,  where  edges  are  labelled  by  aggregate 
incremental  increases  in  mean  squared  error  (mse).  Paths  in  this  graph 
correspond  both  to  trajectories  of  the  Viterbi  algorithm,  and  to  cosets 


of  the  code.  For  the  rate  1/2  convolutional  code  [1  +  D^,  1  +  D  + 
this  graph  has  only  9  vertices.  In  this  case  it  is  particularly  simple  to 
calculate  per  dimension  expected  and  worst  case  mse,  and  performance 
is  similar  to  the  binary  (24, 12]  Golay  code. 

Our  methods  also  apply  to  quantization  of  arbitrary  symmetric 
probability  distributions  on  [0, 1]  using  convolutional  codes.  For  the 
uniform  distribution  on  [0, 1],  the  expected  mse  is  the  second  moment 
of  the  “Voronoi  region”  of  an  infinite  dimensional  lattice  determined 
by  the  convolutional  code.  It  may  also  be  interpreted  as  an  increase  in 
the  reliability  of  a  transmission  scheme  obtained  by  nonequiprobable 
signalling.  For  certain  convolutional  codes  we  obtain  a  formula  for 
expected  mse  that  depends  only  on  the  distribution  of  differences  for 
a  single  pair  of  path  metrics. 
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Summary. 

In  bis  celebrated  paper  on  the  algebraic  structure  of  convolutional 
codes,  Fbrney  [1]  showed  that  by  using  the  invariant-factor  theorem, 
one  can  transform  an  arbitrary  polynomial  generator  matrix  for  an 
{n,k)  convolutional  code  C  into  a  basic  (and  ultimately  a  minimal) 
generator  matrix  for  C.  He  also  showed  how  to  find  a  polynomial  in¬ 
verse  for  a  basic  generator  matrix  for  C,  and  a  basic  generator  matrix 
for  the  dual  code  C^.  In  this  paper,  we  will  discuss  efficient  ways  to 
do  all  these  things.  Our  main  tool  is  the  “entended  invariant  factor 
algorithm,”  which  we  introduce  here. 

1.  The  Extended  Invariant  Fhctor  Algorithm. 

The  goal  of  the  invariant  factor  algorithm  (see  e.g.  [2,  Sec.  6.2.4),  [3, 
Sec.  6.3.3],  or  [4,  Sec.  12.2J)  is  to  take  an  arbitrary  kxn  matrix  G  (with 
k  <  n)  over  a  Euclidean  domain  R,  and  by  a  sequence  of  elementary 
row  and  column  operations,  to  reduce  O  to  a  fc  x  n  diagonal  matrix 
r  =  diag(7i, . . .  ,7t),  whose  diagonal  entries  are  the  invariant  factors  of 
G,  i.e.,  Qi  =  Ai/Ai-i,  where  Aj  is  the  gcd  of  the  ixi  minors  of  G.  The 
goal  of  the  extended  invariant  factor  algorithm,  which  we  introduce  in 
this  paper,  is  to  take  the  same  input,  and  not  only  find  F,  but  also  to 
find  akxk  unimodular  matrix  X,  and  an  n  x  n  imimodular  matrix  Y, 
such  that  XGY  =  F. 

To  describe  the  extended  invariant  factor  algorithm,  we  need  to 
take  a  closer  look  at  the  original  invariant  factor  algorithm.  Formally, 
it  can  be  described  as  follows.  Beginning  with  the  matrix  Gq  ~  G,  it 
produces  a  sequence  of  k  x  n  matrices  Gi,  where  Gj+i  is  derived  from 
Gi  by  either  an  elementary  row  operation  or  an  elementary  column 
operation.  We  can  represent  this  algebraically  as 

Gi+l  =  Si+lGiFli+li  (1.1) 

where  Ej+j  and  Fi+\  are  kx  k  and  n  x  n  elementary  matrices,  respec¬ 
tively.  If  Gj+i  is  obtained  from  G<  via  a  row  operation,  then  Fj+i  =  /„, 
but  if  Gi.fl  obtained  from  Gi  via  a  column  operation,  then  Eif  i  =  It. 
After  a  finite  number  N  of  steps,  we  obtain  Gn  =  F.  (The  details  of 
which  elementary  row  and  column  operations  to  perform,  and  in  which 
order,  are  of  central  importance,  of  course,  but  for  reasons  of  space,  we 
refer  the  reader  to  [2,  ^c.  6.2.4],  or  [3,  Section  6.3.3]  for  them) 

The  extended  invariant  factor  algorithm  builds  on  the  invariant 
factor  algorithm.  In  addition  to  the  sequence  Go,Gi, . .  .,Gn,  the  ex¬ 
tended  invariant  factor  algorithm  also  works  with  a  sequence  of  uni¬ 
modular  k  X  k  matrices  Xo, .  ■  ■ ,  Xy,  and  a  sequence  of  unimodular 
n  X  n  matrices  Vo, . . . ,  Flv.  The  sequences  (A;)  and  (Vi)  are  initialized 
as  Xo  =  It,  Vb  =  fn,  and  updated  via  the  rule  (cf.  Eq.(l.l)) 


fimetions  in  the  indeterminate  D  over  F.  A  generator  matrix  for  C  is  a 
kxn  matrix  with  entries  in  F(D)  whose  rows  form  a  basis  for  C.  Given 
an  arbitrary  generator  ccatrlx  G  for  C,  we  can  easily  transform  G  to  a 
generator  matrix  with  polynomial  entries  by  multiplying  the  rth  row  of 
G  by  the  1cm  of  the  denominators  of  its  components.  In  this  section, 
we  will  see  how  the  extended  invariant  factor  algorithm  introduced  in 
Section  1  can  be  used  to  transform  an  arbitrary  polynomial  generator 
matrix  for  C  into  a  basic  generator  matrix  for  C.  (The  transition  from 
a  basic  to  a  minimal  generator  can,  if  desired,  then  be  done  by  the 
simple  algorithm  originally  described  in  [1],  or  perhaps  more  lucidly  in 
Kailath  (3,  Sec.  6.3.2),  where  the  process  is  described  as  “row-reducing” 
a  polynomial  matrix).  We  will  see  that  the  extended  invariant  factor 
algorithm  also  produces,  more  or  less  for  free,  a  pol3momial  inverse  for 
the  basic  generator  matrix,  and  a  basic  generator  matrix  for  the  dual 
code  C-*'. 

Assume  then  that  G  is  a  k  x  n  polynomial  generator  matrix  for  a 
convolutional  code  C  over  a  field  F.  Since  the  ring  of  polynomials  over 
E  is  a  Euclidean  domain,  we  may  apply  the  extended  invariant  factor 
algorithm  described  in  Section  1,  thereby  obtaining  a  decomposition  of 
the  form  (1.5).  In  what  follows,  the  matrices  Xy  and  Yy  produced  by 
the  extended  invariant  factor  algorithm  will  be  denoted  simply  by  X 
and  Y. 

The  matrices,  X,  Y,  and  F,  contain  much  valuable  information 
about  the  code  C  and  the  generator  matrix  G.  To  extract  this  in¬ 
formation,  however,  we  need  to  define  several  useful  “pieces”  of  these 
matrices,  which  we  call  F*,  F^,  K,  and  H: 


Ft  =  leftmost  k  columns  of  F  =  diag(7i , . . . ,  7*).  (2.1) 

ri  =  •  Ft*  =  diag(7»/7i, •  •  • . Ikhk)-  (22) 

=  leftmost  k  columns  of  Y.  (2.3) 

=  rightmost  n  —  k  columns  of  Y.  (2.4) 


Here  then  are  useful  “outputs”  of  the  extended  invariant  factor 
algorithm,  when  applied  to  G. 

•  A  basic  generator  matrix  for  C:  Gbuic  =  Fj^ ’XG.  (That  is,  Gbuic 
is  obtmned  by  dividing  the  tth  row  of  XG  by  the  invariant  factor 
7i,  for  i  =  l,...,k.) 

•  A  polynomial  inverse  for  Gbatic  ■ 

•  A  polynomial  pseudo-inverse  for  G,  with  factor  7*;  K^F'^X. 

•  A  basic  generator  matrix  for  ,  i.e.,  parity-check  matrix  for  C: 
H. 


Xi+i  =  Ej+iXi  (1.2) 

y;+,  =y;F+,.  (1.3) 

It  is  a  simple  matter  to  prove  by  induction  that 

XiGYi  =  Gi  fort  =  0,1,...,  AT,  (1.4) 

so  that  specializing  (1.4)  with  i  =  N,  we  have 

XyGYy  =  F,  (1.5) 

which  is  the  desired  “invariant-factor”  diagonalization  of  G.  A  rough 
analysis  of  this  algorithm  shows  that  it  requires  0(dnk^)  polynomial 
divisions,  or  0(d^nk*)  field  operations  (addition,  subtraction,  multipli¬ 
cation,  or  division  in  F),  where  d  denotes  the  maximum  degree  of  any 
polynomial  in  G. 

2.  Application  to  the  Analysis  of  Convolutional  Codes. 

We  define  an  (n,fc)  convolutional  code  C  over  a  field  F  to  be  a  k- 
dimensional  subspace  of  F(D)’',  where  F(D)  is  the  field  of  rational 
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Abstract 

Consider  the  transmission  of  a  finitely  interleaved  rate  ^  con- 
volutionally  encoded  message  over  a  non-memoryless  channel 
having  two  internal  states  Ho  and  Hi  where,  when  in  state  Hq, 
the  channel  resembles  a  noiseless  Binary  Symmetric  Channel 
(BSC),  whereas  when  in  state  Hj,  the  channel  is  totally  blocked 
and  is  well  approximated  by  a  Binary-Input-Single-Output  chan¬ 
nel.  Assume  that  the  channel’s  internal  state  is  drawn  at  random 
once  every  h  channel  uses,  and  then  remains  constant  for  the  fol¬ 
lowing  h  channel  uses.  IHirther  assume  that  the  message  is  short 
in  comparison  to  A,  and  that  due  to  delay  constraints,  the  mes¬ 
sage  must  be  decoded  within  Nh  channel  uses,  where  N  need 
not  be  large  in  comparison  to  the  code’s  constraint  length. 

The  probability  of  a  message  error,  the  normalized  expected 
number  of  bits  in  error,  and  the  Bit  Error  Rate  (BER)  are  ana¬ 
lytically  computed  for  the  periodic  N  x  h  chip  and  word  inter¬ 
leavers,  where  a  chip  refers  to  a  binary  code  symbol,  and  a  word 
refers  to  a  n-tuple  of  consecutive  chips. 

An  analytic  expression  for  the  BER  is  also  given  for  pseudo¬ 
random  word  and  chip  interleavers  and  for  the  corresponding 
limiting  cases  of  infinite  interleaving  i.e  N  —*  oo. 

Summary 

Let  U  =  (Uq,  Ufi-i)  denote  an  erasure  pattern  i.e.  an  array  of 
N  elements  t/p  e  {0, 1}  0  <  p  <  y  -  1.  Let  C  =  {Co,-..,Cn-i)  be 
an  array  of  N  channels  where  for  each  0  <  p  <  -  1  the  channel 

Cf,  is  a  noisclcso  binary-input  binary-output  channel  (  “noiseless”)  if 
Up  =  0  and  otherwise,  if  Up  =  1,  Cp  is  “erased”  i.e.  a  binary-input 
single-output  channel.  Thus,  \i  Up  =  0  the  channel  transition  proba¬ 
bilities  satisfy  P(l|l)  =  P(0|0)  =  1,  and  otherwise,  if  Pp  =  1  we  have 
P(“?’’10)=  P(“r\\)=  1. 

Consider  the  transmission  of  an  L-length  binary  message  over  the 
array  of  channels  C  using  a  constraint  length  K,  rate  ^  convolutional 
code  with  zero  padding.  To  fix  notations,  let  D  =  (do, . .  .,dt,-i)  dj  € 
GF(2)  be  the  message  of  length  L  which  is  produced  by  the  source. 
Using  the  GF(2)  arithmetic,  the  encoder’s  output  can  be  written  as 

K-i 

0,1=  i  =  0,-..,L  +  K -2,  1  =  1,. ..,11,  (1) 

1^=0 

where  dj  =  dj  unless  j  <  0  or  j  >  i  in  which  case  dj  =  0,  and 

where  gi'^  i/  =  0, . . .,  K  -  1  are  the  coefficients  of  the  1-th  generating 
polynomial  of  the  convolutional  code.  We  shall  use  the  term  word  for 
an  n-tuple  (cj,, . . .  ,Cj„)  for  some  j  g  {0, . . .,  L  +  K  -  2}.  Similarly 
we  shall  use  the  term  chip  for  a  binary  code  symbol  c^i  for  some  fixed 
i  g  {0,. . .,  i  -I-  A'  -  2}  and  1  g  {1, . .  .,n}. 

A  periodic  chip  (word)  interleaver  transmits  the  chip  Cji  via  the 
channel  Cp  where  p  =  n(j  -  1)  -(-  1  mod  N  (resp.  p  =  j  mod  N).  A 
random  chip  interleaver  transmits  Cji  via  a  channel  which  is  selected  at 
random  uniformly  from  C.  A  random  word  interleaver  ensures  that  all 
chips  of  a  common  word  are  transmitted  via  the  same  channel  (which 


is  selected  at  random). 

We  consider  a  receiver  which  after  de-interleaving  uses  maximum 
likelihood  sequence  estimation  (MLSE),  t.e.  Viterbi  decoding,  to  es¬ 
timate  the  transmitted  message.  We  assume  that  ties  are  resolved 
randomly.  Denote  the  decoder’s  estimate  of  the  message  by  dj  j  = 
0,....l-l. 

We  show  how  given  any  erasure  pattern  I/,  one  can  compute  the 
probability  of  a  message  error, 

f\,SG(«)  =  Pr{30  <J<L-1  s.t.  dj  dj},  (2) 

the  expected  number  of  bits  in  error  normalized  by  the  message  length, 

(3) 

^  j=0 

and  the  bit  error  rate 

BER(W)=  lim  Pl(U),  (4) 

L-^oo 

for  the  periodic  word  and  chip  interleavers. 

In  practical  applications  the  erasure  pattern  2/  is  a  random  variable 
and  one  is  then  interested  in  the  weighted  averages  of  (2)  (3)  or  (4)  over 

all  2^  erasure  patterns  or  in  some  other  availability  criterion  which 
can  be  similarly  computed. 

The  asymptotic  bit  error  rates  (4)  for  chip  and  word  interleavers 
in  the  limit  of  infinite  interleaving  depth,  i.e.  as  .V  ->  oo  are  also 
computed.  Notice  that  in  the  limit  of  infinite  interleaving  (periodic  or 
random),  the  channel  resembles  a  descrete  memoryless  erasure  chan¬ 
nel.  Our  approach  to  the  analysis  of  this  case  has  benefitted  from 
the  work  of  Burnashev  and  Cohn  on  the  performance  of  convolutional 
codes  on  the  BSC[1].  The  error  rate  associated  with  nonideal  random 
chip  and  word  interleavers  is  also  found. 

Some  numeric  examples  for  scenarios  based  on  the  European  cel¬ 
lular  phone  system  (GSM)  [2],  are  provided. 

The  analysis  of  finite  messages  transmitted  using  deterministic  in¬ 
terleavers  is  mostly  combinatoric  in  nature  (once  an  erasure  pattern 
has  been  fixed).  For  this  situation  we  give  a  set  of  linear  recursion 
equations  which  enable  the  computation  of  the  probability  of  a  mes¬ 
sage  error  and  the  expected  number  of  bits  in  error.  The  study  of  the 
limit  of  the  normalized  expected  number  of  bits  in  error  (BER)  for 
this  situation  involves  some  algebra. 
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This  paper  proposes  a  modified  transfer  function  analysis 
that  yields  the  individual  bit  error  probability  for  any  spec¬ 
ified  input  bit  position  of  an  {n,k,m)  convolutional  encoder. 
The  method  is  useful  for  analyzing  the  unequal  error  protection 
(UEP)  capabilities  of  codes. 

Summary 

Unequal  error  protection  (UEP)  codes  are  of  interest  in  sev¬ 
eral  environments,  e.g.  packet  switched  networks  and  multi-user 
environments.  A  modified  transfer  function  is  now  described 
that  can  be  used  for  analyzing  the  UEP  capabilities  of  convolu¬ 
tional  codes. 

We  employ  the  method  of  determining  a  transfer  function 
from  an  augmented  state  diagram  [1].  To  modify  the  aug¬ 
mented  state  diagram  so  that  the  individual  bit  error  proba¬ 
bilities  are  determined,  each  branch  is  assigned  the  new  label 
X'Y^'Y^  ■  •  ■  VjI*,  where  jk  is  equal  to  the  input  bit  in  the  i** 
position,  and  t  is  the  Hamming  weight  of  the  branch  output. 
Obviously,  the  sum  of  the  j*’s  is  the  Hamming  weight  of  the  in¬ 
put  vector.  Mason’s  gain  formula  is  then  applied.  The  resulting 
UEP  transfer  function  is 

T{X,Yu  --,Yu)^  f: 

((=s(^r«eisO 

where  Cij  is  the  number  of  paths  associated  with  the  input 
sequence  distribution  of  I’s  that  generates  code  vectors  of  weight 
d,  ii  is  the  number  of  distinct  input  sequence  distributions  that 
generate  code  vectors  of  weight  d,  and  hj  j,  •  •  • ,  represents  a 
particulu  input  sequence  distribution  of  I’s.  The  bound  for  the 
individual  bit  error  probabilities  is  then 

P^{E)<Y.BfP„\<i<k, 

d 

where  PI'^  is  the  probability  that  a  bit  located  in  the  V'  position 
of  the  input  vector  is  decoded  incorrectly,  =  JTjto  <* 

the  total  number  of  I’s  in  bit  position  i  contained  in  all  input 
vectors  that  generate  code  vectors  of  weight  d,  and  =  2“[p(l  — 
p)]*  .  (For  simplicity,  we  assume  a  binary  symmetric  channel 
with  crossover  probability  p.) 

The  modified  state  diagram  for  a  particular  (3,2,1)  code  is 
shown  in  Figure  1.  The  generator  vectors,  the  UEP  transfer 
function,  and  the  bit  error  bounds  are  shown  in  the  first  entry 
in  Table  1. 

Results  for  a  number  of  other  codes  are  also  presented  in 
Table  1.  It  can  be  seen  that  several  factors  affect  the  bit  error 
probability  for  a  specific  input  position.  The  first  term  of  the 
error  probability  expression  is  the  dominant  term.  Obviously, 
different  exponents  in  the  first  terms  result  in  large  differences  in 
the  error  protection  given  to  the  input  bits.  The  lowest  distance 
in  the  «'*  individual  bit  error  bound  is  defined  to  be  the  effective 
free  distance,  de//(i).  For  example,  for  the  (4,2,3)  code  in  Table 
1,  de//(o)  =  6  and  de//(i)  =  4.  The  effective  free  distance  for 

•This  work  was  supported  by  NSF  Grant  EID90-17558,  NSF  Grant 
NCR89-0342t),  and  NASA  Grant  NAGS-557. 


the  t‘*  position  is  the  lowest  Hamming  weight  among  all  code 
vectors  that  are  generated  by  input  sequences  with  at  least  one 
1  in  the  i**  position.  The  individusJ  effective  free  distances  are 
lower  bounded  by  the  overall  free  distance,  (!/,„• 

In  addition  to  the  individual  effective  free  distances,  two 
other  important  factors  affecting  P^'\E)  are  the  number  of  low 
weight  code  vectors  and  the  number  of  I’s  in  position  i  that 
belong  to  input  vectors  corresponding  to  the  low  weight  code 
vectors.  That  is,  in  addition  to  the  traditionally  important  min¬ 
imum  codeword  Hamming  weight  and  multiplicity,  the  distri¬ 
bution  of  I’s  in  the  input  vectors  is  important.  The  number 
of  I’s  in  a  particular  position  is  related  to  the  length  and  to 
the  Hamming  weight  of  the  entire  input  vectors,  but  the  exact 
relationship  has  not  been  completely  determined. 

Using  the  insights  gained  from  this  new  UEP  analysis  tech¬ 
nique,  we  can  design  new  codes  with  different  individual  effec¬ 
tive  free  distances,  and  therefore,  different  levels  of  unequal  error 
protection. 
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Research  Problem  9.5  of  Mac  Williams  and  Sloane’s  book  The  Theory 
of  Error  Correcting  Codes  asks  for  an  improvement  of  the  minimum 
distance  bound  of  the  duals  of  BCH  codes,  defined  over  Fj™,  m  odd. 
The  objective  of  this  talk  is  to  give  a  solution  to  the  above  problem 
by:  (i)  obtaining  an  improvement  to  the  Ax  theorem,  that  we  prove 
is  best  possible  for  many  classes  of  examples,  (ii)  establishing  a  sharp 
estimate  for  the  relevant  exponential  sums  which  implies  a  very  good 
improvement  for  the  minimum  distance  bound,  (iii)  providing  a  dou¬ 
bly  infinite  family  of  counterexamples  to  Problem  9.5  where  both  the 
designed  distance  and  the  length  increase  independently,  (iv)  verifying 
that  our  bound  is  tight  for  some  of  the  counterexamples,  and  (v)  in 
the  case  of  even  m  we  give  a  doubly  infinite  family  of  examples  where 
the  Carlitz-Uchiyama  bound  is  tight,  and  in  this  way  determine  the 
exact  minimum  distance  of  the  duals  of  the  corresponding  BCH  codes. 

More  specifically  we  have  the  following  results: 

Theorem  A:  Let  f . . . , i„)  be  a  polynomial  in  n  variables 
with  coefficients  in  F,,  q  =  2"'.  Let  o(d)  be  the  binary  weight  of  d 
and 

(til  ••••.Ofi  ) 

where  the  maximum  is  taken  over  the  degrees  of  all  the  monomials  in 
F.  We  then  have  that  the  exponential  sum 

S(f)=  Y,  . 

I] . 

is  divisible  by  2*’,  where  6  =  f»nn/s]  is  the  smallest  integer  >  mn/s. 

Remark:  The  above  is  an  improvement  to  a  theorem  of  Adolphson- 
Sperber,  where  the  conclusion  is  that  5(F)  is  divisible  by 
where  r  is  the  degree  of  F.  To  compare  with  our  result  in  a  concrete 
example,  take  a  polynomial  with  54  variables  over  a  finite  field  with 
q  =  S  elements,  and  assume  that  there  is  a  term  degre®  22 

and  weight  4,  and  suppose  there  is  no  other  term  with  s  >  4.  Then 
from  tlicorein  2  we  obtain  that  2^^  =  2*'  divides  5(F)  and  in 

Adolphson-.Sperber  we  get  divisibility  by  2^^  =  2®,  which  is 

certainly  smaller. 

Theorem  B:  Let  q  =  2”',  and  let 

fix)  = 

1 

be  a  polynomial  with  coefficients  in  F,.  Suppose  the  maximum  binary- 
weight  of  the  exponents  is 

(  =  max{(T(dj)}. 

Let  a  be  the  smallest  positive  integer  >  mft.  We  then  have 

r€F, 


Theorem  C:  Let  q  =  2®"',  m  odd  and  let  fix)  be  a  polynomial,  with 
coefficients  in  F,,  of  degree  r,  with  r  equal  to  7  or  9.  We  then  have 

^(_l)Tr(/(r))  <  (r-  1).2”'-'[2''’"v^]. 
r€F, 

Remark:  The  inequality  in  the  above  theorem  is  tight  for  q  =  2® 
and  q  =  2®®. 

Our  doubly  infinite  family  of  counterexamples  to  the  MacWilliams 
and  Sloane  question  is  given  by  the  following  result. 

Theorem  D:  For  each  prime  p  for  which  2  is  of  order  odd  exactly 
ip  -  l)/2,  let  q  =  2UF-t)/2)'".  c  >  0.  Then  for  infinitely  many 
odd  m,  as  well  as  for  infinitely  many  even  m,  we  have 

^(_l)Tr(r')  >(p_  1)^(1  -C). 

r€P, 

The  exact  minimum  distance  of  several  classes  of  BCH  codes  have 
been  previously  computed,  but  as  far  as  we  know,  no  exact  computa¬ 
tion  has  been  done  in  the  case  of  their  duals.  We  have  the  following 
result  in  that  respect: 

Theorem  E:  Let  f  |  2°  -t-  1  and  let  us  further  assume  that  a  is  the 
least  integer  with  this  property.  Then  for  any  b  we  have: 

^  (-D^Ur')  _  (_j)S+l2aS(^_  j) 

Corollary  1:  Polynomials  for  f  |  2“  -)-  1  and  for  the  pairs  (a,  h) 
prov  de  a  doubly  infinite  family  of  examples  for  which  the  Carlitz- 
Uchiyama  bound  is  tight  over  fields  which  are  an  even  power  of  2,  and 
of  the  form  F23.». 

Corollary  2:  The  dual  of  the  BCH  code  with  designed  distance  1  +  2 
where  f  |  2°  -1-  1,  and  for  any  odd  6,  has  minimum  distance  exactly 
22oS-i  —  {(  -  over  the  finite  field  Fjj.i. 

The  fundamental  theorem  of  Chevalley-Warning  has  gone  through 
several  improvements  in  the  work  of  Ax.  Katz,  Mazur  and  Adolphson- 
Sperber.  What  evolves  in  their  work  is  the  role  played  by  the  degree  of 
equations.  It  is  remarkable  that  the  techniques  developed  to  solve  the 
Research  Problem  9.5  of  MacWilliams-Sloane.  a  deep  question  which 
reflects  the  behavior  of  the  weights  of  BCH  codes,  have  provided  a 
new  insight  into  the  important  role  played  by  the  p-adic  weight  of  the 
degrees  in  the  study  of  the  divisibility  properties  of  the  number  of 
solutions  of  a  system  of  equations. 

Our  p-adic  version  of  Serres  archimedean  bound  for  the  sum  of  the 
roots  of  an  i-function  and  the  improvement  of  the  theorem  of  Ax  are 
ample  proof  of  the  utility  of  the  new  techniques,  both  in  coding  theory 
and  number  theory. 

Since  the  results  in  this  paper  are  important  to  mathematicians 
and  were  previously  unsuspected  by  them,  they  are  another  example 
where  the  theory  of  error  correcting  codes  has  been  influential  in  the 
development  of  the  new  mathematical  insights. 
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1  Inti'oduction 

Conway  and  Pless  have  enumerated  in  [6]  the  S5  non-c<piivalenl 
self-dual  doubly-even  codes  of  length  32.  From  these,  the  live 
codes  having  minimum  distance  equal  to  S  are  called  extremal. 
Two  extremal  codes  were  already  known:  the  second  order  Reed- 
Muller  code  RM32  and  the  extended  quadratic  residue  code  QR32 
of  length  32.  The  other  three  discovered  by  Conway  and  Bless  were 
new.  In  [8],  Koch  has  given  another  more  direct  construction  of  the 
three  Conway-Pless  codes  denoted  by  him  F,  U  and  G.  Since  the 
five  extremal  codes,  though  non-e(|uivalenl,  have  the  same  weight 
enumerator  they  also  have  the  same  classical  parameters  aiici  it  is 
natural  to  ask  for  some  parameters  which  may  distinguish  them. 

In  [3]  we  have  introduced  a  now  parameter,  the  regularity  niiin- 
ber  r  of  a  code,  wliich  is  related  to  other  fuiKlamental  parameters 
[7]  by  the  inequalities  e<p<l<~(<r  where  e  is  the  error  cor¬ 
recting  capacity,  p  the  covering  radius,  t  the  external  distance  (in 
the  linear  case  I  is  the  number  of  non-zero  weight  of  the  dual  code) 
aud  7  is  the  number  of  distinct  proper  coset  weight  enumerators 
of  the  given  rode.  In  [1,  2,  !))  we  have  developed  theoretical  tools 
based  on  the  notion  of  partition  design  that  permit  us  to  calculate 
(in  i)rinciple)  the  coset  w<'ight  enumerators  once  a  subgroup  of  the 
automorphism  group  of  the  code  is  given,  more  precisely  when  the 
orbit  space  of  this  group  is  computable.  'I'he  situation  appeare<l 
for  example  in  [.'))  wheie  the  necessary  notions  and  iheoreins  are 
stated. 

In  this  work  we  obtain  the  <'oset  wa-ight  euumeralor.s.  aiul  the 
l)aramelers7  and  r  for  the  Reed  Midler  code  R.M32.  tin- (juadratic 
residue  code  (JR32  of  length  32  and  the  (  ouway  Pless  code  KiC^ 
denoted  by  G  in  [8j.  The  coset  weight  enumerators  for  QR32  hare 
already  been  calculated  by  .Assmus  atul  Pless  in  [1). 

2  Combinatorial  matrix  and  partition 
designs 

In  [4,  2,  3]  vve  liavo  iiil r(Klur(*(l  the  roinhiudlonnl  matrix  A  of  a 
codf*  C  wliicl)  is  rclFiti’d  to  tlir  dislajjrr  dist riljnt matrix  /i  |7j 
(having  as  rows  the  ctjscM  wciglit  onnini’iators  tjf  ( '  }  l)y  f  hr  o«|ual- 
ity  A  =  BS  wh(*rc  is  an  easily  coinputcd  nonsingular  triaiigiiiar 
matrix  related  to  the  Krawt(lu)ul\  matrix.  So  1  he  rose!  wright  euii- 
merators  are  easily  eoinjuitahle  once  we  know  llie  eoml>iiiati>i  iai 
matrix  A. 

We  liave  also  introduced  the  coucej)!  of  partition  dc'sign  a<imit- 
led  I)V  a  code.  In  t  lie  case  wIumc  ('  \s  »  hinary  linear  code  of  lengt  li 
/I  and  dimension  n  —  h*!  H  he  llu'  set  ul  columns  of  a  |>aiitv 

chc'ck  matrix  II  ol  ('.  I'iK’u  a  partition  «  =  {Un.lh . Sh  } 

the  .syndrome  space*  is  said  to  he  a  r  juuhlfim  admitted 

by  the  coele  ('  it  Sin  ~  {tl}.  il\  =  1?  and  if  lor  ii,  r  t  {(k  1 . r) 

m„,,  =  card(  ( A  —  Si )  O  Si,, )  is  a  const  ant  tor  ail  h  t  Si„.  riu*  mat  i  ix 
\J  =  (/Uu,.)  is  said  to  Ix’  tin'  assonatf  nia/njof  tt.  d'lie  least  possi- 
l)le  niiml)er  r  snch  that  ('  admits  a  r-i)aitilion  design  is  called  the 


regularity  nuiuhcr  of  C,  and  the  associate  matrix  of  tliis  minima] 
partition  is  called  the  regulariiy  matrix  of  the  code  C.This  matrix 
doesn't  depend  on  the  choice  of  the  parity  check  matrix. 

One  important  example  of  a  partition  design  admitted  by  a  code 
is  the  set  of  orl>its  under  any  subgroup  G  of  the  automorphism 
grouj)  of  the  code  acting  on  the  syndrome  space. 

\Vc  iiave  also  proved  in  an  earlier  work  that  the  combinatorial 
matrix  /I  of  a  code  C  is  completely  determined  by  any  partition 
design  admitted  by  tlie  code  via  a  linear  recurrence  relation.  In  the 
theoretical  part  of  this  work,  we  give  an  algorithm  that  comi»utes 
llie  regularity  matrix  of  a  code  from  the  knowledge  of  any  given 
partition  design  admitted  by  the  code. 

3  Results 

To  apply  the  above  theory,  we  take  a  permutation  group  G  C 
5„  letting  the  code  C  =  ker  II  invariant  and  we  let  it  act  on 
the  syndrome  sjiace  IF^  as  follows. If  cr  g  G'  and  h  g  lE^,  let  i 
be  any  clement  in  IF^'  such  that  h  =  Hx^  is  the  syndrome  of  x. 

Then  the  element  a(h)  =  Il(ai)^  where  erx  =  cr(ii . x„)  = 

(x„,|), _ -Xoin))  is  uni<|uely  determined  hy  h. 

We  have  written  programs  in  the  computer  algebra  system 
Maple  lh.it  implement  the  above  theorems. To  obtain  the  orbit 
partition  designs  for  the  extended  quadratic  residue  code  and  the 
second  order  Reed-Muller  code  of  length  32,  we  have  taken  the 
full  automori>hi.sm  groups  P67.2(3l)  and  G'A(5,2)  respectively. 
For  the  co<le  G  =  I6/2  wc  have  taken  a  subgroup  of  its  automor¬ 
phism  gro\ip  generated  by  12  automorphisms  fixing  the  set  of  glue 
components  (6,  p.  4yj  which  has  given  316  orbits  in  the  .syndrome 
s|,ace.  Then  we  have  determined  the  combinatorial  matrix  .4  and. 
by  .solving  a  triangular  linear  system,  the  distance  matrix  B.  thus 
obtaining  the  cosei  weight  cnnnierators.  Finally,  applying  our  last 
algorithm,  we  have  obtained  the  regularity  matrices  and  the  reg¬ 
ularity  nunihers  of  the  considered  codes.  We  have  observed  that 
all  enumerators  of  the  cosets  of  weight  1,2.3..')  and  6  are  the  same 
for  the  thii-e  ctidi's. 
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Weight  hierarchies  of  binary  linear  codes  of  dimension  4 
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For  any  linear  code  D,  Supp(D),  the  support  of  D,  is  the  set  of 
positions  where  not  all  the  codewords  of  D  are  zero.  Let  ws{D), 
the  support  weight  of  D,  be  the  size  of  Supp(D).  For  an  [n,fcl 
code  C  and  any  r,  where  0  <  r  <  A:,  the  r-th  minimum  support 
weight  (also  known  as  the  r-th  generalized  Hamming  weight)  is 
defined  by 

dr(C)  =  tmn{ws(D)  [  D  is  an  [n,r]  subcode  of  C}. 

The  weight  hierarchy  of  C  is  the  set  {di(C),d3(C),-  ■  •  ,dii{C)]. 

For  it  <  4  we  give  explicit  description  of  the  possible  weight 
hierarchies. 
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Any  partition  of  the  coordinate  set  of  a  binary  linear  code 
is  shown  to  correspond  to  a  set  of  generalized  MacWilliams 
identities.  Thus,  a  well-chosen  partition  yields  a  promising 
method  to  settle  existence  and  uniqueness  problems.  A  short 
proof  of  a  generalization  of  the  Assmus-Mattson  theorem  is 
given.  In  the  nonlinear  case,  a  generalization  of  the  Delsarte 
inequalities  is  obtained. 

The  main  result 

Coordinate  partitions 

Let  C  be  a  binary  linear  code,  with  coordinate  set  S,  and 
let  7  :=  {Ti,  Tj,  ....  ,  Tp}  be  a  partition  of  S  into  sets  Tu  of  size 
Du  :=  I  Tu  I .  Then  the  weight  distribution  k{7)  of  the  code  C  with 
respect  to  the  partition  7  is  the  set  of  nonnegative  integers 
A.(T)  ;=  |{X€C  I  IXnTuI  =  iu  Vu}|. 

The  generalized  MacWilliams  identities 

We  show  that  for  each  index  vector  i  the  weight  distri¬ 
bution  A(7)}  of  C  and  the  weight  distribution  B(7)  of  the  dual 

code  satisfy  the  equation 

Aj(r)  =  2'^-"  s.  (  n  P.  (ju;  nu))  B.(r),  (*) 

1  j  U=1  *“  J 

where 

-  Jo <->"  [»][?-) 

is  the  Krawtchouk  polynomial  of  degree  i.  The  equations  (*)  will 
be  called  the  MacWilliams  identities  of  C  with  respect  to  the 
partition  7.  They  generally  give  more  information  about  the 
existence  and  the  uniqueness  of  the  code,  but  the  price  is  a 
steeply  increasing  calculation  effort.  Nevertheless,  a  happy 
combination  of  a  powerful  computer  and  additional  information 
on  the  codes  under  consideration  should  settle  quite  a  few  open 
problems.  The  number  of  equations  is  substantially  reduced  if  C 
has  a  large  minimum  weight  or  if  C  is  a  doubly  even  selfduai 
code. 


[n,k]  code  C  can  be  calculated  from  the  Aj  j(7)  with  i  +  j  <  5 
and  the  ^(7)  with  (u,v)  f  G. 

The  covering  radius 

The  following  reformulation  of  this  notion  in  terms  of 
coordinate  partitions  might  be  of  some  use. 

Proposition:  Let  C  be  a  binary  linear  code  of  length  n  and 
covering  radius  p.  Then  p  >  t  if  and  only  if  a  (t,n— t)-partition  7 
exists  such  that  A-  -(7)  =  0  for  all  i,  j  for  which  i  >  j. 

*0 

Generalized  Delsarte  inequalities 

Finally,  if  we  define  the  inner  distribution  of  the  nonline¬ 
ar  code  C  c  F2*  with  respect  to  the  partition  7  to  be  the  set  of 
nonnegative  rational  numbers 

A.(7)  ;=  |Cr^  l{(X,Y)eC.C  1  Wy{X-Y)  =  i}|, 

we  can  derive  the  following  generalization  of  the  Delsarte 
inequalities  (cf.  [2]): 

S;(  n  p.  (knu))  A.>0  Vi. 

J  u=l  J 
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The  exact  weight  distribution 

The  extreme  case  is  the  exact  weight  distribution  of  C,  i.e. 
its  weight  distribution  with  respect  to  the  partition 
£  :=  {{1},  {2},  ...  ,  fnj.} 

of  the  coordinate  set  S  into  its  one-element  subsets.  Cf.  [Ij. 
Clearly,  a  code  is  completely  determined  by  its  exact  weight 
distribution.  Conversely,  any  nontrivial  (0,1)  solution  of  the 
MacWilliams  identities  with  respect  to  the  partition  S  will  be 
shown  to  correspond  to  a  binary  linear  code. 


The  Assmus-Mattson  theorem 

This  famous  result  in  (1]  states  sufficient  conditions 
under  which  the  words  of  fixed  weight  in  a  code  form  a  t-design. 
We  give  a  simple  proof  of  the  following  extended  version. 

Proposition:  Let  7  be  a  (t,n-t)-partition,  let  6  be  an 
integer  greater  than  t  and  let  G  c  {0,  1, ...  ,t}  «  {0, 1, ...  ,n-t} 
be  a  subset  for  which  the  "row  weights"  := 

J{j  €  G  I  xi  =  u}J  form  a  permutation  of  {5,  6-1,  ...  ,  6-t}. 
Tnen  the  weight  distributions  ^{7)  and  fi(7)  of  a  binary  linear 
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The  finite  field  of  cardinality  q  is  denoted  by  IF,.  Let 
n,  s,  k  integers  such  that  ns  =  2*^  —  1.  Let  y(x)  be  a  divisor 
of  X*  —  1  over  IF2,  and  let  7r(x)  be  a  primitive  polynomial  of 
degre  k  over  IF2.  We  consider  the  following  binary  codes  ; 

The  cyclic  code  C,  of  length  N  =  2^  —  1,  generated  by 

x^-1 

(?(x)7r(x) 

The  cyclic  code  P,  of  length  s,  generated  by 


Theorem  1.  If  k  =  2t,  s  =  2^  +  1,  then  the  set  of  non¬ 
zero  weights  of  C  is  the  set  of  all  integers  (2‘  —  l)tn, 

22* -1  _  2^'“'  +  2‘  —  tc  such  that  w  is  a  non- zero  weight 

ofT. 

Theorem  2.  If  k  =  2t,  s  =  2‘  +  1,  and  if  g{x)  is  a  primitive 
divisor  0/  x*  -  1  over  F2.  then  the  set  of  non-zero  weights 
of  C  is  the  set  of  all  integers  2^'~',  (2‘  -  1)uj,  2^‘~’  -  w. 
22t-i  ^2*  —  w  such  that  w  is  even  and 


Theorem  3.  If  k  —  2t,  and  if  g(x)  =  x'  then  the  set 
of  non-zero  weights  ofC  is  the  set  of  all  the  integers  2^*“*, 
(2‘  —  l)tc,  2^‘“*  —  w,  2^‘“*  +  2‘  —  tc  such  that  w  is  even  and 
2  ^  u)  $  2*. 

Theorem  4.  If 

a)  W2I’  is  the  splitting  field  n/  x"  -  1  over  IF2 : 

b)  k  =  2t,  and  there  exists  a  divisor  r  of  t  such  that 
2'"  =  —  1  (mod  s), 

then  the  set  of  non-zero  weights  of  of  all  the  integers 


{2‘  +  l)| 


where  e  =  (-1)'/’',  and  such  that  w  is  a  non-zero  weight  of 

r. 


149 


A  threshold  property  of  linear  codes 


Gilles  Zemor,  and  Gerard  D.  Cohen 
ENST 

46  rue  Barrault 
75634  Paris  Cedex  13 
FVance 

email;  zemorSres.enst.fr,  cohen3iDf.enst.fr 


Abstract 

We  define  and  estimate  the  threshold  probability  8 
of  a  linear  code,  using  a  theorem  of  Margulis  orig¬ 
inally  conceived  for  the  study  of  the  probability  of 
disconnecting  a  graph.  We  then  apply  this  concept 
to  the  study  of  the  erasure  and  Z-channels,  for  which 
we  propose  linear  coding  schemes  that  adnxit  simple 
decoding.  We  show  that  8  is  particularly  relevant 
to  the  erasure  channel  since  linear  codes  achieve  a 
vanishing  error  probability  as  long  as  p  <  8,  where 
p  is  the  probability  of  erasure.  Binomial  codes  have 
highest  possible  8  (and  achieve  capacity).  As  for 
the  Z-channel,  a  subcapacity  is  derived  with  respect 
to  the  linear  coding  scheme.  For  a  transition  prob¬ 
ability  in  the  range  ]log(3/2);  1(,  we  show  how  to 
achieve  this  subcapacity.  As  a  by-product  we  obtain 
improved  constructions  cind  existential  results  for  in¬ 
tersecting  codes  (linear  Spemer  families)  which  are 
used  in  our  coding  schemes. 

Summary 

We  investigate  and  apply  a  seldom  studied  prop¬ 
erty  of  linear  codes,  namely  the  fact  that  they  tend 
to  display  the  following  “threshold”  phenomenon. 
Let  us  consider  a  binary  linear  code  C,  of  param¬ 
eters  (n,  k,  d],  and  let  us  choose  randomly  a  vector  v 
of  length  n  such  that  every  coordinate  is  given  inde¬ 
pendently  the  value  “1”  with  probability  p  and  the 
value  “0”  with  probability  1  -  p,  0  <  p  <  1.  Call 
/c(p)  the  probability  with  which  v  “covers”  some 
non-zero  codeword  of  C  (i.e.  is  such  that  the  support 
supp{v)  of  V  contains  the  support  of  some  codeword 
c).  In  other  words 

/c(p)=  E  p'‘''(l-p)"-H 

v€»'(C) 


where  jrl  denotes  the  weight  of  v  and 

W{C)  —  {u  I  supp(v)  3  supp{c),  c  6  C,  c  ^  0}. 

The  behaviour  we  focus  on  is  that  whenever  C  has 
a  large  enough  minimal  distance,  the  (non-decrea¬ 
sing)  function  p  t-v  /c(p)  jumps  suddenly  from  al¬ 
most  zero  to  almost  one,  around  a  “threshold”  prob¬ 
ability  8.  We  will  show  how  this  fact  stems  from  a 
theorem  of  Margulis,  originally  designed  to  prove  a 
threshold  phenomenon  for  the  probability  /(p)  of 
disconnecting  a  graph,  when  every  edge  is  severed 
with  probability  p. 

Threshold  phenomena  have  been  studied  exten¬ 
sively  in  the  context  of  random  graphs.  We  have 
tried  to  apply  those  techniques  to  the  coding  con¬ 
text,  and  draw  some  consequences. 

We  will  first  place  ourselves  in  the  context  of  the 
erasure  channel,  and  show  that  the  threshold  prob¬ 
ability  is  a  particularly  relevant  par^uneter  for  mea¬ 
suring  the  efficiency  of  a  linear  code. 

We  will  also  discuss  at  some  length  an  applica¬ 
tion  of  the  threshold  phenomenon  to  the  problem  of 
devising  efficient  codes  for  the  asymmetrical  channel 
(the  so-called  Z-channel)  where  every  0  can  be  trans¬ 
formed  into  a  1  with  a  given  probability  p,  while  1  ’s 
are  always  correctly  received.  In  this  setting,  decod¬ 
ing  of  a  received  vector  is  unambiguous  whenever 
the  latter  covers  no  codeword  apeirt  from  the  one 
that  was  initially  sent.  The  idea,  broadly  speaking, 
is  to  use  linear  codes  with  properly  chosen  threshold 
properties  :  the  point  is,  the  probability  that  the  re¬ 
ceived  vector  covers  some  parasite  codeword  should 
be  very  small  whenever  the  proportion  of  0  — ♦  1 
faulty  transitions  stays  imder  a  threshold  value. 

We  will  show  why  highly  intersecting  codes  are 
a  good  choice,  provide  some  constructions,  and  dis¬ 
cuss  their  behaviour  relative  to  the  capacity  of  the 
Z-channel.  It  will  turn  out  that  for  high  error  prob¬ 
abilities  (e.g.  0.586  <  p  <  1)  our  schemes  perform 
quite  acceptably. 
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Summary 

Primitive  cyclic  codes  in  a  multiplicative  group 
algebra. 

A  primitive  cyclic  code  over  a  finite  field  K  =  GF(q)  is 
a  cyclic  code  of  length  n  =  q’^  —  1. 

Let  G*  be  the  multiplicative  group  of  the  finite  field 
G  =  GF(q’^).  We  consider  such  a  code  as  an  ideal  of 
the  modular  algebra  M  =  /\[G*].  An  element  of  A/  is 
a  formal  sum 

^  =  Y.  ^si9),  Xg  G  K. 

gfiG- 

A  primitive  cyclic  code  C  of  defining-set 
T  C  {0, ...,  n  —  1}  is  the  set 

C  =  {xG  A//p,(x)  =  0,  VsGT} 

Where  PsiHg^G-  ^gid))  =  EjeC* 
the  sum  J2g^G'  ^g9*  being  not  a  formal  sum,  but  cal¬ 
culated  in  GF{q"'). 

This  definition  is  equivalent  to  the  usual  definition 
into  the  algebra  R  =  K'[A']/(yY^”'"^).  If  a  is  a  primitive 
root  of  G,  then  the  isomorphism  is: 

4):  R  — >  M 

Permutation  groups  of  cyclic  codes. 

The  permutation  group  of  a  cyclic  code  C  is  the  group 
Fer{C)  of  permutations  of  the  support  G*.  which  lets 
C  globaly  invariant. 

It  is  known  (cf.  [2])  that  each  permutation  <t  G  S(G) 
admits  a  unique  repre.sentation  polynomial  of  degree  less 
than  p”’: 

p™-! 

/(A')  =  ^  A, A"',  A,  €  G,  and(T(g)  =  f(g). 

i=0 

Theorem  1  Let  C  be  a  primitive  cyclic  code,  andT  its 
defiiiing-set. 

.'1  permutation  a  G  5(G*),  with  associated  polynomial 
/(A  )  =  IS  a  permutation  of  C  if  and  only 

if,  for  all  s  G  T,  the  polynomial  f(Xy  mod  .Y'’”’  —  A' 
has  exponents  in  T,  t.e.  f(Xy  --  /<;  A'-' , /or  some 

Pj  €  G. 


Automorphism  groups  of  the  binary  double¬ 
error-correcting  BCH  codes 

The  binary  double-error-correcting  BCH  code  (cf.[4])  is 
the  BCH  code  over  GF(2)  of  designed  distance  5  and 
length  2'"  —  1  (m  >  2),  its  defining-set  is: 

T=  {2’,2’ -l-2‘+i  /  iG  {0,...,m-  1}} 

For  m  =  3,  this  code  is  trivial:  it  is  the  repetition 
code  of  length  7.  We  suppose  m  >  3. 

Let  Bm  denote  the  binary  double-error-correcting 
BCH  code  of  length  2'”  —  1.  Let  <t  be  a  permutation 
of  B,n,  with  associated  polynomial  f{X).  Applying  the 
criterion  of  theorem  1,  for  s  =  1  we  deduce 

m  —  1  m  —  1 

/(.V)  =  y; 

*=0  *=0 

and  for  s  =  3,  we  obtain 

f{X?  =  ^ 

,  Y"'  /  1.2  ,  2  I  \  \-2'+'’^  +  '>J  +  * 

+  2^  -l-a._ii>)A 

•.je{0,  ,m-l} 

The  permutation  ff  is  in  Per(B„, )  if  and  only  if  its  as¬ 
sociated  polynomial  /(A)  is  a  permutation  polynomial 
and  /(A')®  is  a  polynomial  with  exponents  in  T. 

Using  these  conditions,  we  deduce  the  following  the¬ 
orem: 

Theorem  2  Form  >  4.  the  automorphism  group  of  the 
BCH  code  Bm  is  the  semi-linear  group  of  GF(2'")  over 
GF(2'").  For  m  =  4.  the  automorphism  group  of  the 
BCH  code  B4  is  the  semi-linear  group  of  GF{IQ)  over 
GF{4). 

Corollary  1  For  m  >  4,  the  automorphism  group  of 
the  extended  binary  double- error-correcting  BCH  code 
of  length  2"*  is  the  semi-affine  group  of  GF{2”').  For 
m  =  4.  its  automorphism  group  is  the  semi-affine  group 
ofGF(\Q)  over  GF(4). 
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Abstract 


For  comparison,  the  Korner-Marton  graph-entropy  bound  [3]  states 
(in  the  above  notation)  that 


We  present  a  new  bound  on  the  zero-error  list  coding  capacity,  and  using 
which,  show  that  the  list-of-3  capacity  of  the  4/3  channel  is  at  most  6/19 
bits,  improving  the  best  previously  known  bound  of  3/8.  The  relation 
of  the  bound  to  the  graph-entropy  bound  of  Korner  and  Marton  is  also 
discussed. 


The  Bound 


Consider  a  discrete  meinoryless  channel  li  =  P)  where  I  denotes 

the  input  alphabet,  3  the  output  alphabet,  and  P(j\i)  the  probability 
that  j  e  J  ia  received  given  that  i  €  I  is  transmitted.  A  set  S  C  is 
called  independent  if  for  every  y  €  3^ 

N 

n  n  p(ynM=o. 

r€Sn=l 

A  set  C  C  is  called  a  zero-error  list-of-L  code,  L>  1,  if  every  S  CC 
with  l.S|  =  f,  -I-  1  is  an  independent  set.  Zero-error  list-of-L  capacity  is 
defined  by 

Cl  =  lim  sup  -Jr  log  Af (A,  Z) 
y-oo  A 

where  M{N,  L)  is  the  maximum  possible  size  for  alist-of-i  code  of  length 
A.  (All  logarithms  are  to  to  base  2.) 


rnA  Y!  f  (  A|n)  •  .  .  -  Ktlz^rm4.nnT  •  •  •  -  ^kn) 

nsl 

where  the  outer  summation  is  over  all  possible  choices  of  distinct 
codewords  .,xi;  6  C.  Thus,  the  Korner-Marton  bound  up- 

perbounds  the  rale  ft  by  (essentially)  the  average  of  the  quantity 

^n=i  f(Ai„, . .  . XL,,),  whereas  here  ft  is  bounded 

by  the  minimum  of  the  same  quantity. 

The  bound  here  may  also  be  seen  as  a  generalization  of  the  Shannon 
bound  on  zero-error  capacity  [1],  [2].  Shannon’s  bound  is  obtained  by 
looking  at  the  zero-error  code  through  a  single  user  channel;  here  we 
look  at  the  code  through  a  multiaccess  channel. 


The  4/3  Channel 


The  4/3  channel  has  a  four  letter  input  and  output  alphabet  A  = 
(0, 1,2,3},  and  the  transition  probabilities  P(Jji)  =  1/3  for  all  i,j  £  A, 
i  #  j.  The  bound  C3  <  6/19  is  obtained  (after  .some  manipulation) 
by  applying  the  above  theorem  using  the  following  P' .  (i)  For  any 
€  A,  ft'(>|fi,J,i)  =  (S,j.  (ii)  For  any  ii.ij.is,/  €  A  with  fj  /  (3, 


( 


0 

(4  -  |{il.l2,t3}|)"' 


if  i  6  {ti, ij, 13}; 
otherwise. 


We  call  a  channel  k-uniform  if  k  is  the  smallest  integer  for  which 
Cl  >  0.  The  new  bound  is  as  follows. 
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Theorem  1  The  rate  ft  of  any  list-of-k  code  C  on  a  k-uniform  channel 
A”  satisfies 

1 

ft-f<  min  min  min — ^  Y] /( Ai„, . . . ,  A,„„;  . . .  ,i*n) 

l<m<A;rfn+i . Xk  P'  niiV  " 

~  —  n=l 

where  P'  ranges  through  all  conditional  probability  assignments  such  that 
whenever  {ij , . . . ,  i„,,  j', , . . . ,  i(„,  t„,+i. . . . ,  i*}  is  independent  in  K 

P'(j\i . .  h„-e\ . ik)P'(j\i\,  ■■  ■,  C,- 'm-n,  ■  - . ,  it)  =  0 

for  all  j.  The  mutual  information  term  is  computed  using  the  probability 
assignment 
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Pr {X i,,  —  z”],, , . . . ,  A  „t,,  —  z*,,,,, ,  Y,,  —  y„ }  — 

Qn(^in)  ■  '  'Qni^mn  )P  (jtn  Iz^ljn  ■  -  • ,  Z'tn) 

where  Q„  is  the  empirical  distribution  of  the  nth  coordinate  of  the  code¬ 
words  in  C.  i.e.,  Q„(i)  equals  thr  fraction  of  codewords  i  £  t’  with  i„  =  t. 
1  €  Z,  The  number  1  goes  to  ztro  as  A  incrra.scs  for  any  fixed  ft  >  0. 
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Abstract 


Given  a  channel  and  an  input  process  we  study  the  minimum  randomness  of 
those  input  processes  whose  output  statistics  approximate  the  original  out¬ 
put  statistics  with  arbitrary  accuracy.  We  introduce  the  notion  of  resolva¬ 
bility  of  a  channel,  defined  as  the  number  of  random  bits  required  per 
channel  use  in  order  to  generate  an  input  that  achieves  arbitrarily  accu¬ 
rate  approximation  of  the  output  statistics  for  any  given  input  process.  We 
obtain  a  general  formula  for  resolvability  which  holds  regardless  of  the 
channel  memory  structure.  We  show  that,  for  most  channels,  resolvability 
is  equal  to  Shannon  capacity. 

By-products  of  our  analysis  are  a  general  formula  for  the  minimum 
achievable  (fixed-length)  source  coding  rate  of  any  finite-alphabet  source, 
and  a  strong  converse  of  the  identification  coding  theorem,  which  holds  for 
any  channel  that  satisfies  the  strong  converse  of  the  channel  coding 
theorem. 


There  are  situations  of  practical  interest  where  a  random  process  needs 
to  be  generated  with  some  specified  statistics.  In  order  to  generate  a  ran¬ 
dom  process  we  assume  that  a  primary  random  source  with  an  equiprobable 
distribution  is  available  (e.g.  a  stream  of  independent  fair  coin  flips).  A  key 
measure  of  the  complexity  of  a  random  process  is  the  rate  at  which  its  most 
efficient  generator  requires  random  bits,  in  order  to  generate  every  sample- 
path  of  the  random  process.  This  question  becomes  particularly  interesting 
when  rather  than  requiring  the  exact  reproduction  of  the  desired  statistics, 
we  require  an  arbitrarily  accurate  approximation  of  the  finite-dimensional 
distributions.  This  requires  the  introduction  of  a  measure  of  distance 
between  the  desired  and  generated  distributions:  in  this  paper  we  focus  most 
of  our  attention  on  the  variational  or  /,  distance.  We  prove  that  for  any 
random  process  the  minimum  complexity  required  to  approximate  its  statis¬ 
tics  is  equal  to  its  minimum  achievable  fixed-rate  (noiseless)  source  coding 
rate,  and  that  this  rate  is  equal  to  the  sup-entropy  rate  of  the  random  pro¬ 
cess.  The  Asymptotic  Equipartition  Property  plays  no  role  in  the  proof  of 
this  result,  not  only  because  it  is  not  powerful  enough  to  yield  an  approxi¬ 
mation  result  in  the  sense  of  vanational  distance,  but  because  the  result 
holds  for  processes  that  are  not  necessarily  ergodic  or  stationary.  The  proof 
uses  a  new  technique  we  refer  to  as  repetition. 

Some  practical  situations  such  as  system  simulation  or  the  remote 
artificial  generation  of  random  processes  such  as  speech  sounds  or  image 
textures,  suggest  an  important  generalization  of  the  foregoing  semp:  Given 
an  input  process  and  a  channel,  we  want  to  approximate  the  resulting  output 
process.  However,  this  problem  does  not  boil  down  to  the  previous  setup 
when  the  approximation  has  to  be  accomplished  by  generating  the  input 
We  define  the  resolvability  of  a  channel  as  the  number  of  random  bits  per 
input  sample  required  to  achieve  arbitrarily  accurate  approximation  of  the 
output  statistics  regardless  of  the  actual  input  process.  Inmitively.  we  can 
anticipate  that  the  resolvability  of  a  system  will  depend  on  how  "noisy"  it 
is.  A  coarse  approximation  of  the  input  statistics  whose  generation  requues 


comparatively  few  bits  will  be  good  enough  when  the  system  is  very  noisy, 
because,  then,  the  output  cannot  reflect  any  fine  detail  contained  in  the  inpui 
distribution. 

Although  the  problem  of  approximation  of  ouqtut  statistics  involves  nc 
codes  of  any  sort  or  the  nansmissionA'eproduction  of  information,  its 
analysis  and  results  turn  out  to  be  Shannon  theoretic  in  nature.  In  fact,  oui 
main  conclusion  is  that  (for  most  channels)  resolvability  is  equal  to  Shan¬ 
non  capacity. 

More  concretely  we  show  that  the  resolvability  of  an  arbinary  cbannd 
is  equal  to  the  supremum  of  the  input-output  sup-information  rate,  and  that 
this  quantity  coincides  with  the  Shannon  capacity  if  and  only  if  the  channel 
satisfies  the  strong  converse. 

In  addition  to  the  abovememioned  connections  with  the  theories  of 
source  coding  and  channel  coding,  the  approximation  of  output  statistics  is 
related  to  the  problem  of  identification  via  channels  introduced  by  Ahlswede 
and  Dueck  [1].  Although  a  completely  general  direct  identification  coding 
theorem  is  known  [1,2].  its  converse  had  been  shown  only  in  a  so-called  sofl 
version  in  [1]  and  in  the  strong  sense  in  [3],  but  always  within  the  contexi 
of  discrete  memeoryless  channels.  Here,  we  show  a  general  strong  con¬ 
verse  to  the  identification  coding  theorem  which  follows  as  a  simple  conse¬ 
quence  of  the  achievability  pan  of  the  resolvability  theorem. 

The  paper  also  investigates  the  effect  of  replacing  the  worst-case  com¬ 
plexity  measure  by  the  average  number  of  random  bits  requited  for  approxi¬ 
mation.  as  well  as  the  replacement  of  variational  distance  by  normalized 
divergence.  In  the  cases  considered,  the  forgoing  conclusions  remain  valid. 

We  conclude  with  another  result  within  the  approximation  theory  ol 
output  statistics  which  formalizes  a  folk-theorem  is  channel  coding;  the  out¬ 
put  distribution  due  to  any  good  channel  code  (a  code  with  rate  close  K 
capacity  and  vanishing  error  probability)  must  approximate  the  cxiqnn  distri¬ 
bution  due  to  the  input  that  maximizes  mutual  information,  and  dius 
achieves  capacity. 

The  journal  version  of  this  paper  is  to  appear  in  [4], 
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Shannon  introduced  the  concept  of  zero-error  capacity  of  a  discrete 
memoryless  channel.  The  channel  determines  an  undirected  graph  on 
the  symbol  alphabet,  where  adjacency  means  that  symbols  cannot  be 
confused  at  the  receiver.  The  zero-error  or  Shannon  capacity  is  an 
invariant  of  this  graph.  Gargano,  Korner,  and  Vaccaro  have  recently 
extended  the  concept  of  Shannon  capacity  to  directed  graphs.  Their 
generalization  of  Shannon  capacity  is  called  Sperner  capacity.  We 
resolve  a  problem  posed  by  these  authors  by  giving  the  first  example 
(the  two  orientations  of  the  triangle)  of  a  graph  where  the  Sperner 
capacity  depends  on  the  orientations  of  the  edges. 

Sperner  capacity  seems  to  be  achieved  by  nonlinear  codes,  whereas 
Shannon  capacity  seems  to  be  attainable  by  linear  codes.  In  particular, 


linear  codes  do  not  achieve  Sperner  capacity  for  the  cyclic  triangle. 
We  use  Fourier  analysis  or  linear  programming  to  obtain  the  best 
upper  bounds  for  linear  codes.  The  bound  for  unrestricted  codes  are 
obtained  from  rank  arguments,  eigenvalue  interlacing  inequalities  and 
polynomial  algebra. 

The  statement  of  the  cyclic  g-gon  problem  is  very  simple;  what  is 
the  maximum  size  N,{n)  of  a  subset  S„  of  {0, 1, , .  .,9  -  1}"  with  the 
property  that  for  every  pair  of  distinct  vectors  x  =  (z,),  p  =  (y,)  6  5„, 
we  have  Xj—pj  =  1(  mod  9)  for  some  y?  For  9  =  3  (the  cyclic  triangle), 
we  show  JV3(n)  ~  2".  If  however  S„  is  a  subgroup,  then  we  pve  a 
simple  proof  that  1S„|  <  \/3". 
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Abstract 

(X,  Y,  Z)  is  an  ensemble  of  independent  random  triples, 
each  distributed  according  to  some  probability  distribu¬ 
tion  p{x,y,z).  Two  legitimate  users,  Px  having  X  and 
Py  having  Y,  communicate  in  order  to  agree  on  a  joint 
key  while  keeping  it  almost  unknown  to  an  eavesdropper 
Pz  who  knows  Z.  Communication  is  conducted  over  a 
noiseless  channel  according  to  a  predetermined  proto¬ 
col.  Pz  hears  all  transmissions  over  the  channel  and 
knows  the  protocol  used.  We  show:  (1)  The  legitimate 
communicators  can  agree  on  the  secret  if  and  only  if 
they  can  find  one  using  just  two  messages.  (2)  There 
are  cases  where  a  secret  can  be  found,  but  one  message 
does  not  suffice.  (3)  Similar  results  hold  whether  the 
legitimate  communicators  are  required  to  agree  on  the 
secret  with  probability  one  or  just  with  high  probability. 

Summary 

The  following  problem  was  introduced  by  Maurer  [1] 
and  further  investigated  by  Ahlswede  and  Csiszar  (2). 
It  concerns  two  parties  with  some  common  information 
conversing  publicly  to  agree  on  a  secret  key  that  is  un¬ 
known  to  an  eavesdropper  listening  to  their  discussion. 

Let  {X,Y,Z)  be  a  sequence  (X,, yi,Z,)"_j  of  inde¬ 
pendent  random  triples,  each  distributed  according  to  a. 
probability  distribution  p(x,  y,  z).  Two  legitimate  users, 
Px  having  X  and  Py  having  T,  communicate  over  a 
noiseless  channel  according  to  a  predetermined  protocol 
in  order  to  agree  on  a  joint  key.  An  eavesdropper  Pz 
who  knows  Z  and  the  communication  protocol,  and  has 
access  to  all  the  bits  transmitted  over  the  channel,  tries 
to  determine  the  value  of  the  key. 

The  probability  distribution  p  is  said  to  achieve  a  se¬ 
crecy  rate  s  if  for  every  f  >  0  there  exists  n,  a  (pos¬ 
sibly  randomized)  communication  protocol  ^  defined 
on  X  and  Y,  and  a  random  key  K,  such  that:  (1) 
Px  and  Py  know  the  key:  H{K\X,^{X,Y))  <  e  and 
ff(A'|T,  $(X,  T))  <  e;  (2)  Pz  does  not  know  the  key: 
H{K\Z,^iX,Y))  >  H{K)  -  e-  (3)  I<  has  a  per-letter 
entropy  of  at  least  s:  ^H(K)  >  s.  The  secrecy  capacity 
C(p)  of  p  is  the  largest  achievable  secrecy  rate. 

Determining  the  secrecy  capacity  of  a  given  distribu¬ 
tion,  or  even  whether  this  capacity  is  positive,  seems 
difficult  and  only  weak  general  bounds  are  known  [Ij. 
For  that  reason,  [2]  considered  the  simpler  one-way  ver¬ 
sion  of  the  problem.  The  legitimate  users  are  allowed 
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to  transmit  only  one  message  (say  from  Px  to  Py).  For 
this  restricted  case,  [2]  determined  the  secrecy  capacity 
of  a  probability  distribution  p  in  term  of  its  single-letter 
entropies.  Yet  interaction  introduces  “memory”  to  the 
problem,  and  similar  results  appear  unlikely. 

In  this  paper  we  introduce  a  gradation  of  measures 
ranging  from  the  one-way  capacity  of  p  to  its  secrecy 
capcity.  A  communication  protocol  is  m-message  if  it 
always  calls  for  at  most  m  transmitted  messages.  For 
example,  a  two-message  protocol  may  require  Px  to 
transmit  a  message  and  then  call  on  Py  to  respond. 
We  define  the  achievable  m-message  secrecy  rates  and 
the  m-message  secrecy  capacity  Cm(p)  of  p  as  we  did 
before,  except  that  the  protocols  allowed  must  be  m- 
message.  In  particular,  C'i(p)  is  the  one-way  secrecy 
capacity  considered  by  [2],  and  C(p)  =  limm—oo  Cm(p)- 
Using  communication-complexity  techniques  we  find 
a  necessary  and  sufficient  condition  for  the  existence  of 
a  secret  key  (i.e.,  C(p)  >  0).  We  use  this  condition 
to  show  that  C(p)  >  0  implies  C2(p)  >  0,  hence  that 
secrecy  can  be  achieved  if  and  only  if  it  can  be  achieved 
using  just  two  messages.  We  then  show  that  there  are 
cases  where  a  single  message  cannot  achieve  a  secret  key 
(Ci(p)  =  0),  but  two  or  more  messages  can  (C(p)  >  0). 
Therefore,  there  is  a  gap  between  one  message  and  two 
or  more.  Potentially,  one  could  use  the  necessary  and 
sufficient  condition  to  improve  the  general  bounds,  but 
so  far  we  have  not  been  able  to  do  so. 

We  also  consider  the  unambiguous  secrecy  capacity  of 
p  where  Px  and  Py  must  know  the  key  with  probability 
1.  Again,  we  show  that  a  secret  key  exists  if  and  only  if 
it  can  be  achieved  with  just  two  messages,  and  we  give  a 
simple  necessary  and  sufficient  condition.  Additionally, 
we  examine  the  more  general  case  where  (A',  Y,  Z)  is 
an  arbitrary  triple  of  random  variables  (rather  than  an 
ensemble).  We  show  that  when  A'  and  Y  are  uniformly 
distributed  over  their  support  set  and  are  independent 
of  Z  the  (appropriately  modified)  capacity  is  between 
/(X;y)-logmin{^(X|Y),/f(yiX)}  and  /(X;y). 
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Abstract 

In  this  paper  we  consider  cipher  systems  that  provide  both 
secrecy  and  security  for  a  given  number  of  (subsequent) 
transmissions.  We  show  that  there  exists  a  broad  class 
of  situations  in  which  we  can  do  better  (less  key  require¬ 
ment)  than  just  concatenating  a  perfect  secrecy  cipher 
and  an  authentication  code. 

Summary 

One  of  the  important  results  in  Shannons  seminal  pa¬ 
per  on  secrecy  systems  is  that  if  the  cipher  is  a  perfect 
group-operation  cipher,  i.e.,  a  group-operation  cipher  in 
which  the  keys  are  uniformly,  independently  distributed, 
then  the  system  is  unconditionally  perfect,  [1].  The  situ¬ 
ation  in  which  the  eavesdropper  is  active  was  considered 
by  Simmons  who  first  to  give  bounds  on  the  probability 
of  a  successful  imitation  attack,  Pi,  and  the  probability  of 
a  successful  substitution  attack,  Ps-  The  results  of  Sim¬ 
mons  deal  with  the  situation  in  which  the  (legal)  sender 
sends  only  one  message.  His  results  were  improved  and 
generalized  to  the  case  of  multiple  use  of  the  channel,  [3], 

W- 

A  naive  solution  for  obtaining  security  against  the  ac¬ 
tive  eavesdropper’s  actions  in  the  multiple-use  case  would 
be  to  select  a  new  key  for  every  new  transmission.  It  fol¬ 
lows  from  [2]  that  in  the  case  of  L  transmissions  the  total 
amount  of  key  is  bounded  from  below  by 

H{K)  >  -Llog2(FrPs)  >  -2Llog2  Pj,  (1) 

where  Pd  =  max{Pi,Ps)  and  where  H(^K)  denotes  the 
average  total  key  uncertainty.  However,  in  general,  it  was 
shown  in  [4]  that  one  actually  has  the  improved  bound 

H(K)>-{L  +  l)\og^Pd.  (2) 

It  is  well-known  that  one  has  a  certain  amount  of  security 
in  a  secrecy  system  if  the  message  source  exhibits  redun¬ 
dancy.  Using  this  idea  we  can  obtain  security  even  if  the 
source  exhibits  no  redundancy  by  introducing  redundant 
dummy  source  letters  before  the  encryption.  In  the  se¬ 
quel  we  assume  for  simplicity  that  the  source  is  a  M-ary 
symmetric  memoryless  source,  i.e.,  H{M)  =  log2  M.  We 
can  show  the  following  result 

Theorem  1;  Suppose  the  source  letters  are  encoded  by 
an  error-control  code  with  rate  R  <  1  and  subse¬ 
quently  are  encrypted  by  a  perfect  group-operation  ci¬ 
pher.  Then  this  system  provides  perfect  secrecy  and 
the  probability  of  a  successful  impersonation  attack, 


P[,  satisfies  Pi  >  Moreover,  the  same  perfor¬ 

mance  can  be  obtained  for  L  channel  uses  using  a  key 
whose  uncertainty  satisfies  H{K)  >  Ljf  log2  M.  ■ 

This  theorem  suggests  that  the  design  of  systems  with 
secrecy  and  security  would  be  a  simple  one.  Unfortunately 
it  can  be  shown  that  it  may  happen  that  we  have  P5  =  1. 

Let  us  for  simplicity  assume  that  we  want  Pi  =  Ps  for 
every  transmission.  We  can  prove  the  following 

Theorem  2:  Suppose  we  have  an  A-code  that  provides 
perfect  secrecy  and  Pd  —  Pi  =  Ps  for  one  transmis¬ 
sion.  If  the  A-code  allows  for  an  encoding  rule  up¬ 
dating  scheme  as  discussed  in  [5],  then  we  have  a  ci¬ 
pher  system  that  provides  perfect  secrecy  for  a  discrete 
memoryless  M-ary  source,  and  for  which  Pd  =  Pi  — 
Ps  is  equal  for  L  subsequent  transmissions  such  that 
the  key  uncertainty  H{K)  satisfies 

H{I<)  >  LiH(M)  -(-  log,  ^)  -  (//(M)  -  logj  ^). 

Note  that  if  one  naively  concatenates  a  prefect  secrecy 
code  with  the  A-code  we  would  obtain  H{K)  >  LH{M)-h 
(L+l)  log2  ^ .  Note  also  that  Theorem  3,  albeit  for  a  spe¬ 
cial  case,  in  some  sense  combines  Shannon’s  result  on  the 
key  entropy  for  secrecy  systems  and  the  key  requirements 
induced  by  the  security  demands. 
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Inirodvciipn 

Positioning  Systems  are  devices  that  measure  the  position  of  remote 
objects.  Examples  include  radars,  sonars,  the  Global  ^sitioning  Systems 
(GPS)  14]  and  vehicle  tracking  systems  (21.  A  recently  published 
monograph  1 1 J  describes  a  unifled  approach  to  the  analysis  of  positioning 
systems.  The  major  element  of  this  approach  is  Shannon  theory.  In  that 
monograph,  it  is  shown  that  this  approach  can  be  used  to  establish  a 
performance  measure  for  positioning  systems  (based  on  the  average 
mutual  information),  a  limit  to  that  performance  (using  a  generalisation  of 
Shatmon's  capacity  theorem),  derive  general  theorems  about  positioning 
systems,  calculate  a  source  information  rate  for  the  objects  being 
monitored  and  optimise  aspects  of  the  system  performance.  The  analysis 
presented  in  that  monograph  covers  both  multi-link  and  multidimensional 
[3]  channels  for  the  case  of  additive,  white  gaussian  noise  (AWGN). 
Although  the  AWGN  assumption  does  allow  insight  into  the  functioning 
of  positioning  systems,  it  does  limit  the  applicability  of  the  analysis. 

Information  theoretic  analysis  of  conventional  communications  systems 
also  started  using  the  AWGN  assumption,  however  much  work  has  been 
carried  out  to  derive  results  for  more  realistic  channel  models.  This  paper 
shows  how  a  multi-link  and  mulddimensional  positioning  system  can  be 
pardoned  into  its  various  component  parts,  so  allowing  the  results  from 
conventional  communication  theory  to  be  directly  applied  to  positioning 
systems. 

Analysis 

The  essence  of  the  paper  is  prove  a  theorem  which  establishes  that  the 
average  mutual  information  of  a  multi-link  or  multidimensional  channel  is 
equal  to  the  sum  of  the  average  mutual  information  of  each  individual 
channel  minus  a  term  which  represents  the  degree  of  interaction  between 
the  channels  (the  following  symbols  are  defined  in  ( 1  ])  i.e. 

I(xl,x2;yl,y2)=I(4l;<!)l)+I(42;(ti2)-I((l)l;(l)2) - (X) 

This  equation  is  easily  generalised  to  more  than  two  dimensions,  though 
the  forni  of  the  last  term  changes  somewhat. 

Equation  (1)  holds  under  the  condition  that  the  noise  is  independent 
between  channels  e.g.  the  noise  on  one  link  is  independent  of  the  other 
links.  It  can  then  shown  that  provided  the  measurement  error  is  small  and 
the  co-ordinate  system  is  not  highly  correlated  that  the  degree  of 
interaction  is  dependent  almost  entirely  on  the  geometric  nature  of  the 
positioning  system's  co-ordinate  system  i.e. 

I((t>l;(li2)  =  I(^l;^2)  - (2) 

This  means  that  the  overall  performance  of  a  system  can  be  estimated 
from  the  individual  single  channel  performance  (already  well  explored  for 
conventional  communication  systems)  together  with  this  geometric  term. 
This  term  will  be  denoted  S™  i.e. 

Sg  ■  I(^l;42). 

Results 

The  paper  goes  on  to  give  examples  of  the  nature  of  this  geometric  term 
for  radial-radial,  angle-angle  and  polar  positioning  systems.  A  radial- 
radial  system  measures  position  by  calculating  the  intersection  of  two 
circles.  These  circles  are  often  derived  from  ranging  measunements.  An 
angle-angle  system  measures  position  by  calculating  the  intersection  of 
two  lines.  Two  direction  finding  stations  perform  this  type  of  operation. 
Simple  radar  systems  operate  with  a  polar  co-ordinate  system. 

The  geometric  term,  was  calculated  for  a  radial-radial,  angle-angle 
and  polar  system.  In  each  case  a  rectangular  a  priori  p.d.f  of  width  2a  and 
height  a,  centred  around  the  y-axis  with  the  bottom  edge  aligned  along  the 


x-axis.  The  radial-radial  and  angle-angle  systems  had  reference  sites  at  (- 
.5,0)  and  (+.5,0).  For  each  of  these  three  systems,  Sg  was  calculated  as  a 
function  of  a,  using  Mathematica.  This  involved  usmg  a  combination  of 
numerical  and  analytical  integration.  The  result  is  shown  in  Figure  I. 

lit* 


The  system  with  the  lowest  value  of  Sg  was  the  polar  system.  This  is  to 
be  expected  as  the  radial  and  angular  ro-ordinates  in  a  radial  system  are 
almost  independent.  Indeed,  if  the  a  priori  p.d.f  is  circular  then  Sg=0.  On 
the  other  hand  both  the  radial-radial  and  angle-angle  systems  are  clearly 
not  independent  and  have  large  values  of  Sg.  In  the  case  of  the  radial- 
radial  system  this  is  because  the  determination  of  the  first  radial 
measurement  constrains  the  second  measurement  to  those  radii  which  will 
intersect  with  the  circle  prescribed  by  the  first  measurement.  Similar 
reasoning  applies  to  the  angle-angle  system. 

At  first  sight  the  values  of  S™  for  the  radial-radial  and  angle-angle  system 
do  not  seem  to  be  large,  but  it  should  be  remembered  that  as  a  rule  of 
thumb,  a  one  bit  reduction  in  performance  will  be  translated  to  a  halving 
in  system  accuracy,  so  a  three  bit  loss  will  cause  an  eight  fold  reduction  in 
accuracy.  Note  that  both  the  angle-angle  and  radial-radial  systems  have 
very  large  values  of  Sg  when  a  is  either  small  or  large.  This  result  should 
be  treated  with  caution  because  a  large  value  of  Sg  means  that  the  co¬ 
ordinates  are  becoming  highly  correlated.  This  means  that  one  of  the 
assumptions  used  in  deriving  the  basic  equation  will  not  be  satisfied. 

This  geometric  term  can  also  contrasted  with  the  Geometric  Dilution  of 
Precision  (GDOP)  for  these  systems.  Given  a  properly  selected  point  at 
which  to  calculate  the  GDOP  the  overall  trends  shown  in  Figure  1  are 
confirmed.  A  brief  example  is  also  provided  as  to  how  a  systems  engineer 
might  use  the  results  of  this  paper  in  the  analysis  of  a  positionhg  system. 

Overall,  the  paper  presents  a  result  which  will  allow  systems  engineers  to 
directly  apply  results  from  realistic  single-link  channels  to  the  analysis  of 
multi-link  channels  used  in  positioning  systems.  As  well  the  analysis 
allows  a  deeper  understanding  of  the  comparative  performance  of 
different  types  of  systems. 
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Abstract —  We  examine  the  problem  of  decoding  a 
linear  block  code  used  over  a  binary  symmetric  channel 
when  the  goal  is  to  minimize  the  average  information  bit 
error  probability.  For  fixed  crossover  probability  p,  the 
optimal  decoder  can  be  implemented  by  standard  array. 
We  present  optimal  strategies  for  choosing  coset  leaders 
in  the  very  quiet  (p  0)  and  very  noisy  (p  ->  1/2)  limits. 

Summary 

We  consider  the  problem  of  decoding  a  binary  (n,k)  linear 
block  code  C  used  over  a  binary  symmetric  channel  (BSC)  with 
error  probability  p  <  1/2.  We  assume  that  an  informatioii  vector 
u  is  chosen  at  random  from  W,  the  set  of  all  binary  fc-tuples,  with 
each  element  of  U  having  equal  probability  of  being  chosen.  The 
encoder  transmits  the  codeword  c  =  uG  across  the  BSC,  where 
G  is  a  generator  matrix  of  the  code. 

If  the  goal  of  the  decoding  is  to  minimize  the  average  proba¬ 
bility  of  a  codeword  error,  then  the  well  known  solution  is  to  use 
a  standard  array  decoder  with  minimum  weight  coset  leaders.  In 
some  applications,  however,  we  might  be  more  interested  in  min¬ 
imizing  the  average  information  bit  error  probability  Pi„f,  given 

by 

.•=  ipOn  +  ti(r)|] 

where  u(r)  denotes  the  estimate  of  u  given  the  received  vector  r, 
and  lx|  denotes  the  Hamming  weight  of  x. 

The  decoding  problem  in  the  very  quiet  limit  p  -»  0  has  been 
examined  before  when  G  is  systematic,  for  example  in  [3],  (4).  The 
problem  of  choosing  an  optimal  generator  matrix  under  certain 
constraints  is  discussed  in  [2j,  [5].  Here  we  make  no  particular 
assumptions  about  the  optimality  of  G. 

in  general,  for  a  particular  code  there  will  be  several  different 
strategies,  each  corresponding  to  the  optimal  decoding  rule  over 
some  range  p  e  [p,_i.p,].  As  pointed  out  in  [1],  each  of  these 
strategies  can  be  imi)lemented  by  standard  array.  For  example. 
Figure  1  shows  P\„t{p)  — p  for  the  (15.7)  BCH  code  with  a  system¬ 
atic  generator  matrix  and  optimal  decoding.  Each  p,  is  a  root  of 
a  polynomial  of  the  form 


where  (C,-")  is  the  set  of  codewords  that  can  be  transmitted 
when  the  ith  bit  of  u  is  a  zero  (one). 

After  some  manipulations,  we  find  that  the  optimal  Pi„f  is 


(i-pr 


q€0  U6M 


|q.).uG| 


where  Q  is  any  choice  of  coset  representatives.  For  fixed  p,  the 
optimal  estimate  of  the  information  vector  is 


u(r)  =  (r  -t-  1)G~' 
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Figure  1:  Pj^f(p)  —p  for  the  (15,7)  BCH  code.  The  optimal  decod¬ 
ing  rule  may  be  one  of  five  different  strategies  depending  on  the 
value  of  p. 


where  G  '  is  the  right  inverse  of  G  and  1  is  the  optimal  coset 
leader,  which  is  the  element  of  the  coset  r  -f  C  that  minimizes 


z 

ter+C 


Theorem  1  For  all  p&  [O.pj],  the  follomng  coset  leader  selection 
strategy  minimizes 


1.  Let  ti, . . .  tj  denote  the  minimum  weight  elements  in  r  +  C. 
For  each  such  t,,  compute  g,  :=  t,G"'. 


2.  Let  y  be  a  binary  vector  of  length  k.  If  the  majority  of  the 
have  a  1  (0)  in  the  ith  position,  then  set  yj  equal  to  I  (0).  If 
no  majority  exists  in  the  ith  position,  then  repeat  steps  1-2 
for  the  ith  position  using  the  coset  elements  of  next  higher 
weight. 


3.  The  optimal  coset  leader  is  I  =  z(r)  -byG.  wherr:  z(r)  is  the 
element  in  r  +  C  having  all  zeros  in  the  first  k  ])osxtions. 

The  above  strategy  is  different  from  tljat  presented  in  [4],  If  a 
coset  has  a  unique  minimum  weight  vector,  then  this  vector  is  the 
optimal  co.set  leader  when  p  <  pi . 


Theorem  2  Suppose  that  the  genemtor  matrix  is  systematic,  re.. 
G  =  [Lj  P],  where  In-  is  the  k  x  k  identity  matrix.  The  following 
strategy  minimizes  P,„f  in  the  very  noisy  region  p  G  [p„,,  1/2],  If 
the  ith  column  of  It  occurs  j  times  in  G.  then  decode,  the  ith 
information  bit  by  treating  each  of  these  j  positions  n.‘>  u  repetition 
code  and  ignoring  all  other  po.sitions  of  the  received  vector. 


Thus  if  no  column  of  P  is  equal  to  a  column  of  I*,  then  the 
decoding  strategy  that  minimizes  P„(  when  p  >  p„  is  to  ignore 
all  parity  check  bits,  l.e.,  the  optimal  coset  leader  w-ill  have  all 
zeros  in  the  first  k  positions.  For  example,  in  Figure  1,  where  a 
.systematic  encoding  of  the  (15,7)  BCH  code  wa.s  used,  we  see  that 
for  p  >  p.,  «  .32,  we  get  Pt„f  =  p. 
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Abstract 

Many  coded  modulation  constructions  are  obtained  as  some 
restricted  subset  of  an  infinite  constellation  (IC)  of  points  in  the 
n-dimensional  Euclidean  space,  for  example,  lattice  code.  We 
shall  consider  an  IC  as  a  code  without  restrictions  employed  for 
the  AWGN  channel.  We  construct  exponential  upper  and  lower 
bounds  for  the  decoding  error  probability  of  an  IC  as  functions 


of  generalized  SNR  /i=y  I  a^,  where  y  is  the  density  of  IC  (the 
number  of  points  on  the  unit  of  volume)  and  is  the  dispersion 
of  the  AWGN.  The  upper  bound  is  obtained  by  means  of  a 
random  coding  method  and  it  is  very  similar  to  the  usual  random 
coding  bound  for  the  AWGN  channel.  The  exponents  of  these 
upper  and  lower  bounds  coincide  for  lower  values  of  ii.  We  show 
also  that  the  exponent  of  the  random  coding  bound  for  the 
ensemble  of  all  possible  IC's  with  the  fixed  density  y  coincides 
with  the  exponent  for  the  ensemble  of  linear  IC's  -  lattices.  We 
conclude  from  this  fact  that  lattices  have  the  same  meaning  with 
respect  to  an  AWGN  channel  as  linear  codes  have  with  respect  to 
a  discrete  symmetric  channel  without  memory. 

Summary 


During  the  last  years  several  efficient  codes  for  a  channel 
with  additive  white  Gaussian  noise  (AWGN)  were  constructed  by 
means  of  coded  modulation  methods.  Many  coded  modulation 
constellations  were  obtained  as  some  restricted  subset  of  an 
infinite  constellation  (IQ  of  points  in  the  n-dimensional 
Euclidean  space,  for  example  lattice  codes  (I),  [2].  Obviously,  a 
good  code  can  be  attained  only  from  a  good  IC.  Furthermore  the 
decoding  error  probability  of  the  code  is  often  estimated  by  means 
of  the  parameters  of  the  IC  from  which  this  code  is  obtained. 

Any  countabie  set  S={s,,  $2,  ...  }  of  points  in  the  n- 
dimensional  Euclidean  space  E„  will  be  called  an  infinite 
constellation  (IC)  of  length  n.  Let  V„  (r,s)  be  the  n-dimensional 
sphere  of  the  radius  r  centered  at  the  point  s.  Denote 
V„(r)=V„(r,0).  Let  M„(S,r)=|SfV„(r)|,  where  |a1  is  the 

M„(S,r) 

cardinality  of  the  set  A.  The  limit  ly  -’■i-  =  y,  if 

f-*  00  I  n  V  '  I 

exists,  is  called  the  density  of  S  (here  |V„(r)|  is  the  volume  of 
the  sphere  V„(r)  ).  An  1C  for  which  the  density  y  exists  is  called 
a  regular  IC.  We  shall  consider  a  regular  IC  as  a  code  without 

n 

restrictions  for  AWGN  channel.  The  value  p=y  2  l<fi,  where 
is  a  dispersion  of  AWGN,  is  called  a  generalized  SNR. 

Using  the  random  coding  arguments  [2],  |3],  we  derive  the 
following  theorems. 

Theorem  I .  Let 


Eu(m)  = 


A 

16xe’ 


2'"8;’ 


8ae 

4xe  ^  <8xe. 


M  1,1* 

4xe'  2  "2t’ 


2xe  ^  <4xe; 


and  o(n)  is  a  sequence  of  reals  sudi  that  H™  =0.  Then 

n->0B 

i;  there  is  a  sequence  of  infinite  constellations  S„  ,  n=l,2,...  (n 
is  dimension  of  the  Euclidian  space)  such  dtat  average  decoding 
error  probability  of  S,  sati^ies  the  following  asymptotical 
1 

inequality  --^lnX(S„)  2Eu(/»)  +o(n), 

ii;  for  any  infinite  constellation  S„  -^InVSa)  sEl(/*)  +  o(n), 
fi  ^xe. 

Hi;  for  any  sequence  of  IC  S„,  n=l,2,  ...  ^uch  that  p  <2xe, 

lim  VS.)  S0.5. 
n-»oo 

Theorem  2.  There  is  a  sequence  of  lattices  G,  ,  n=l,2,... 
(n  is  the  dimension  of  the  Euclidian  space)  such  that  the 
decoding  error  probaHUty  X  of  G.  sati^ies  the  following 
1 

asymptotical  Inetptality  - -^ItOdfi)  SEu(/i)  -(-©(n). 

It  is  well  known  [4]  that  in  the  case  of  discrete  symmetric 
channels  without  memory  the  random  coding  exponents  for  the 
ensemble  of  all  codes  and  fur  the  ensemble  of  linear  codes 
coincide.  It  follows  from  Theorem  2,  the  same  fact  take  place 
also  in  the  AWGN  channel  case  for  codes  without  restrictions. 
The  question,  whether  the  random  coding  exponents  for  the 
ensemble  of  all  codes  and  for  the  ensemble  of  linear  codes 
coincide  also  for  any  additive  noise  continues  channel  without 
memory,  remains  open. 
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Noncoherent  detection  schemes  are  extensively  used  when  it  is  dif¬ 
ficult  to  establish  or  maintain  an  accurate  carrier  phase  (!]-[2).  We 
present  a  noncoherent  coded  system  bas^  on  HPSK  modulated  con¬ 
volutional  codes  which  bridges  the  performance  gap  with  respect  to  co¬ 
herent  coded  systems  by  making  use  of  a  noncoherent  decoding  metric 
which  incorporates  an  observation  interval  of  several  channel  signals. 
The  discrete  time  channel  model  considered  in  this  paper  is  given  by 


i;  =  .V.  IT.  ,  i  e^+ 


(1) 


where  =  ±\/E^  and  T,  are  the  transmitted  and  the  received  signals, 
respectively.  The  noise  IT,  is  a  sample  of  an  independent  and  identi¬ 
cally  distributed  sequence  of  complex  Gaussian  random  variables  with 
zero  mean  and  variance  A'q/2  in  each  dimension.  The  carrier  phase 
6  is  assumed  to  remain  constant  over  L  channel  signals  and  to  be 
uniformly  distributed  in  the  interval  [-ir.ir).  For  a  rate  kfn  convolu¬ 
tional  code,  the  suboptimal  noncoherent  branch  metric  calculated  for 
a  subsequence  of  L  =  Jn  signals  is  given  by 


1  = 


j=i 


(2) 


The  parameter  L  is  referred  to  as  the  length  of  the  observation 
interval.  The  metric  of  an  entire  code  sequence  is  given  by  the  sum  of 
metrics  of  its  constituent  i-long  subsequences.  Since  i  is  a  multiple  of 
n,  the  metric  is  calculated  over  an  integral  number  (J)  of  branches  in 
the  trellis  diagram  of  the  code.  Therefore,  for  an  arbitrary  J  and  for 
a  given  number  of  states,  decoding  is  easily  accomplished  by  using  a 
conventional  rate  ^  V'iterbi  decoder  with  the  same  number  of  slates 
and  a  branch  metric  given  by  (2),  Note  that  since  the  metric  (2) 
is  calculated  separately  for  each  subsequence  without  any  regard  to 
previous  subsequences,  the  error  performance  of  the  .system  would  be 
the  same  whether  0  changes  arbitrarily  once  every  I,  signals  or  remains 
constant  forever. 


The  Chernoff  bounding  technique  is  employed  to  obtain  upper 
bounds  on  the  pairwise  error  probability  and  the  average  bit  error 
probability,  and  a  simple  expression  for  the  generalized  cut-off  rate 
[3],  Large  deviations  techniques  are  used  to  find  the  exponential  rate 
of  the  error  probability  of  the  proposed  system.  This  parameter  leads 
to  the  definition  of  the  equivalent  free  distance  of  the  underlying  con¬ 
volutional  codes  of  the  noncoherent  system.  Upper  bounds  on  the  free 
distance  are  provided  as  well. 

The  metric  in  (2)  raises  the  problem  of  phase  ambiguity  since  it 
is  invariant  to  a  180“  rotation  of  an  L-long  subsequence.  Convention¬ 
ally,  this  problem  is  resolved  by  using  a  reference  signal  and  differential 
encoding  and  decoding  (l]-[2)  In  our  approach,  however,  the  phase  am¬ 
biguity  problem  is  resolved  as  an  inherent  part  of  the  coding  system  in 
a  general  framework  of  catastrophic  error  propagation.  Neverl’ieless. 
there  are  close  relations,  depending  on  the  carrier  phase  model,  be¬ 
tween  both  approaches.  It  is  shown  that  for  a  niodel  of  carrier  phase 
changing  arbitrarily  every  L  channel  signals,  the  proposed  system  is 
equivalent  to  appropriate  differential  systems,  and  for  a  constant  car¬ 
rier  phase,  the  proposed  system  constitutes  the  natural  framework  for 
analyzing  and  synthesizing  standard  differentially  encoded  systems  [2], 
In  particular,  it  is  concluded  that  known  optimal  codes  for  coherent 
detection,  namely  those  codes  which  achieve  large  Hamming  distance, 
are  not  necessarily  optimal  for  various  diflerential  systems  as  long  as 
the  observation  interval  is  longer  than  two.  This  fact  is  demonstrated 
by  the  bounds  on  the  pairwise  and  bit  error  probability  and  verified 
by  the  equivalent  free  distance  of  specific  codes  found  by  a  computer 
.search . 
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A  theoretical  performance  analysis  has  been  conducted  for  the  re¬ 
ception  of  noncoherent  frequency  shift  keying  (NCFSK)  over  a  fading 
channel.  The  receiver  is  a  bank  of  energy  detectors,  one  energy  detector 
for  each  frequence  in  the  NCTSK  signal  set.  The  tones  used  in  this  set  are 
assumed  to  be  orthogonally  spaced.  The  key  aspect  of  the  study  is  that 
antetma  diversity  is  considered  and  the  fading  on  the  diversity  branches 
is  assumed  to  be  correlated.  A  general  number  of  diversity  branches  is 
considered.  The  signal  set  is  '  4-  or  8-Ary  NCFSK.  The  fading  is 

assumed  to  be  flat  and  to  vary  slowly  in  time.  Both  the  Rayleigh  and 
Rician  fading  models  are  treated. 

Two  diversity  combining  rules  are  considered.  In  square-law  com¬ 
bining  the  outputs  of  the  various  diversity  branches  for  each  energy  detec¬ 
tor  are  weighted  and  summed.  In  selection  diversity  the  diversity  branch 
with  the  largest  signal-to-noise  ratio  is  the  branch  chosen  for  NCTSK 
detectioa 

We  first  discuss  our  results  for  square-law  combining.  It  is  well 
known  that  a  Rayleigh  random  variable  can  be  regarded  as  the  magnitude 
of  a  complex  Gaussian  random  variable.  In  our  correlation  matrix  the 
real  parts  of  these  variables  on  different  branches  are  assumed  to  be 
correlated.  The  same  is  true  of  the  imaginary  terms.  However,  the 
real  and  imaginary  parts  are  always  assumed  to  be  uncorrelated.  The 
detection  random  variable  is  a  sum  of  squares  of  these  variables.  This 
sum  is  then  transformed  to  another  sum  of  squares,  but  now  the  terms 
are  independent.  The  probability  of  error  is  then  expressed  as  up  to 
a  two-dimensional  integral  in  the  transformed  random  variables.  This 
diagonalization  technique  works  with  any  correlation  matrix,  Rician  or 
Rayleigh  fading,  and  with  any  practical  order  of  diversity  L. 


The  research  contained  herein  was  funded  by  the  Department  of  Commu¬ 
nications,  Contract  #36001-0-3505/01-55. 
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Samples  of  our  calculations  are  shown  in  Figures  1  and  2,  below. 
In  Figure  1  we  show  how  Rician  fading  degrades  with  correlated  branch 
diversity  and  square-law  combining.  Figure  2  repeats  the  situation  for 
the  Rayleigh  case.  The  loss  due  to  correlation  is  greater  for  Rician  than 
for  Rayleigh.  Thus,  in  a  correlated  diversity  environment,  performance 
in  Rayleigh  fading  can  be  better  than  in  Rician  fading.  This  can  never 
occur  for  uiKorrelated  fading. 

In  the  result  just  discussed  the  correlation  coefficient  between  diver¬ 
sity  branches  was  the  same.  The  weighting  on  branches  was  equal.  No 
matter  what  the  distribution  of  correlation  coefficients  between  branches, 
equal  weighting  per  diversity  branch  was  always  found  to  be  best 

We  have  also  considered  selection  diversity  combining  and  compared 
it  to  square-law  combining.  It  was  found  that  selection  combining  is 
inferior  to  square-law  combining  in  correlated  diversity  situations.  A 
report  has  been  written  on  our  research  and  is  referenced  as  [I]  below. 
A  list  of  previous  research  on  correlated  diversity  braiKbes  is  given 
in  [1].  Finally,  Mazo’s  matched  filter  lower  bound  [2]  for  two-beam, 
frequency-selective  fading  involves  a  quadratic  fomi.  We  have  applied 
our  diagonalization  method  to  it  and  rederived  Mazo's  result.  The 
derivation  is  given  in  [Ij. 
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E/N„  (dB) 

Figure  1;  BER  for  A'  =  dB  Rician  and  L  -  -1. 


E/N.  (dB) 

Figure  2:  BER  for  A’  =  0  arxl  i  -  4 
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Abstract:-  This  paper  addresses  the  problem  of  determining  the 
undetected  error  probability,  F^(e),  for  linear  {njc)  block  codes  on 

channels  with  memory.  In  the  past,  PJ<£)  was  investigated  mainly 

on  memoryless  channels,  such  as  the  binary  symmetric  channel 
(BSC).  We  present  two  techniques  for  determining  where 

both  techniques  employ  trellis  diagrams.  The  first  technique  is 
based  upon  a  trellis  diagram  of  the  states  of  a  channel  model  such 
as  the  Gilbert-Elliott  or  Fritchman  channel  models.  The  second 
technique  involves  taking  the  trellis  diagram  of  the  syndrome 
register  of  a  code  as  well  as  the  stationary  and  transition 
probabilities  of  any  of  the  aforementioned  channel  models  into 
account.  Results  indicate  that  in  many  cases  PJ^e)  for  codes  on 

channels  with  memory,  far  exceeds  that  of  PJ^e)  on  memoryless 

channels  for  the  same  code.  This  fact  therefore  makes  it  very 
important  to  be  able  to  calculate  F^(e)  on  channels  with  memory, 

seeing  that  PJie)  on  the  BSC  certainly  does  not  represent  an 

upperbound.  We  also  show  that  the  often  assumed  upperbound  on 

P^(c),  2  ‘  ,  is  exceeded  on  channels  with  memory.  The  first 

technique  that  we  present  is  applicable  to  short  or  low  rate  codes, 
while  the  second  can  be  used  with  high  rate  or  long  codes. 

SUMMARY 

Until  Leung  &  Heilman  [1]  proved  differently,  the  undetected  error 
probability  (PJ.e))  for  linear  (nJc)  block  codes  was  assumed  to  be 

•in-k) 

upper  bounded  by  2  '  ,  In  papers  published  after  this  contribution 
of  Leung  and  Heilman  [1],  various  classes  of  codes  were 
investigated  with  respect  to  probability  of  undetected  error.  This 
was  done  in  order  to  determine  which  codes  are  proper  and  which 
are  improper.  Proper  codes  are  those  for  which  P  (e)  is  a 

monotonically  increasing  function  in  e,  over  0  <  f  <  0.5.  Codes  for 

which  P  (e)  is  not  a  monotonic  function  in  e  over  0  <  e  <  0.5  are 
u 

termed  improper.  From  the  aforementioned  one  can  gather  that  for 

proper  codes  PJ.£)  is  always  bounded  by  2  *'''**  []].  However,  in 

investigations  published  previously,  it  was  always  assumed  that 
errors  occurred  independently,  i.e.  the  channel  used  is  the  Binary 
Symmetric  Channel  (BSC). 

On  many  real  communication  channels  such  as  the  switched 
telephone  network,  radio  links  etc.  errors  do  not  occur 
independently  but  in  bursts  [2].  The  equation, 

n 

F  (e)  =  y4E'(l-E)"'',  (1) 

It  I 

j=l 


of  utmost  importance  to  know  the  pxrsiiions  of  errors.  ITierefore, 
calculating  P  (e)  of  a  code  on  a  channel  with  memory,  compels  one 

to  take  into  account  the  underlying  error  mechanic  present  on  the 
channel.  The  error  mechanism  is  typically  modelled  by  means  of 
discrete  Markov  chains  such  as  the  Gilb^-Elliott  and  Fritchman 
channel  models. 

In  the  first  technique  developed,  we  construct  a  trellis  diagram  of 
the  states  of  a  chaimel  model  such  as  the  Gilbert-Elliott  or 
Fritchman  channel  models.  The  length  of  the  trellis  is  equal  to  the 
length  of  a  codeword,  n.  Assume  that  the  transmitted  codeword  is  v 
and  the  received  word  is  r  with  the  error  vector  being  e,  giving 
r=  V  +  e.  Therefore,  for  a  linear  block  code,  whenever  e  is  equal  to 
a  valid  codeword,  r  is  also  a  valid  codeword.  This  is  exactly  the 
process  that  takes  place  whenever  an  undetected  error  occurs.  This 
technique  determines  the  probability  of  e  being  any  one  of  the 
nonzero  codewords  of  a  code  by  determining  the  probability  of 
occurrence  of  each  codeword  within  a  code  except  for  the  all-zero's 
codeword. 

The  second  technique  which  we  present  involves  the  construction  of 
a  trellis  diagram  representing  the  states  of  the  syndrome  register  of 
the  code.  The  length  of  the  trellis  is  also  equal  to  the  number  of  bits 
in  a  codeword,  n.  Syndrome  calculation  is  irsually  performed  in 
order  to  determine  whether  a  received  word  is  a  codeword  or  not. 
Whenever  a  received  word  is  not  a  codeword,  the  .syndrome 
associated  with  it  is  non-zero,  the  syndrome  being  zero  only  if  the 
received  word  is  a  valid  codeword.  This  very  pinciple  is  the  basis 
upon  which  this  particular  technique  for  the  determination  of  P  (e) 

is  based.  After  construction  of  the  trellis  diagram  of  the  syndrome 
register  states,  all  paths  leading  from  the  all-zero  state  back  to  the 
all-zero  state  in  a  number  of  transitions  equal  to  code  word  length, 
n.  are  retained.  The  rest  of  the  paths  terminating  in  non-zero  states 
after  n  transitions  are  discarded.  The  paths  remaining  in  this  way 
can  now  be  associated  with  all  valid  codewords  of  a  code.  This 
reduced  trellis  diagram  can  now  be  used  in  conjunction  with  any 
binary  channel  mode)  to  determine  P^ie)  for  the  code.  The 

advantage  of  this  technique  is  that  P^(£^  can  be  determined  easily 

for  very  long  codes.  It  furthermore  removes  the  need  of  knowing 
the  weight  spectrum  of  the  code.  The  limiting  factor  in  this  case  is 
the  length  of  the  syndrome  register. 

The  first  technique  takes  all  codewords  into  account  making  it 
u.sable  with  smaller  and  very  low  rate  codes,  this  being  due  to  the 
fact  that  considering  all  codewords  soon  becomes  very  complex  in 
larger  high  rate  codes.  The  second  technique  is  usable  with  high 
rate  codes,  seeing  that  not  all  codewords  are  considered  and  the 
complexity  in  this  ca.se  is  linear  and  not  exponential  as  in  the  first 
technique. 


with  the  weight  enumerator  of  the  (n»  block  code,  only  holds 
for  the  determination  of  PJf)  on  channels  without  memory,  i.e.  the 
BSC  [31. 

With  this  paper  we  intend  presenting  techniques  aimed  at 

determining  P  (f)  for  linear  cvclic  block  codes  on  channels 
^  u 

modelled  by  the  well-known  Fritchman  and  Gilbert-Elliott  channel 
models  [2). 

When  determining  PJf)  for  codes  on  channels  with  memory,  it  is 
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In  thj.s  paper,  we  study  the  simplified 
reception  of  convolutionally  encoded  CPM 
signals.  Our  receiver  is  of  the  Viterbi  type, 
but  the  number  of  receiver  states  is  smaller 
than  that  of  the  optimum  one.  We  use  the 
concept  of  reduced  state  sequence  estimation 
(RSSE),  originally  introduced  by  Eyuboglu  et 
al.  for  the  reception  of  signals  in  the  ISI 
environment. 


Summaiiy 

The  block  diagram  of  the  system 
considered  is  shown  in  Fig.  1,  where  G  is  a 
convolutional  encoder,  M  a  finite  state 
sequential  machine  that  models  a  CPM  modulator 
and  V  is  a  receiver  of  the  Viterbi  type,  but 
the  number  of  receiver  states  is  smaller  than 
that  of  the  optimum  one.  In  the  receiver  we 
use  the  concept  of  reduced  state  sequence 
estimation  (RSSE),  originally  introduced  by 
Eyuboglu  at  al.  for  the  reception  of  signals 
in  the  ISI  environment  [1].  The  idea  of 
Eyuboglu  has  been  then  used  by  Svensson  [2] 
and  Huber  [3]  for  the  reception  of  uncoded  CPM 
signals.  In  this  paper  we  apply  this  concept 
to  the  reception  of  convolutionally  encoded 
CPM  signals.  The  receiver  operates  on  the 
trellis  which  is  reduced  as  compared  to  the 
combined  trellis  of  the  encoder  and  the 
modulator.  In  our  paper,  following  the  results 
of  our  earlier  research  [4,5],  the 
unsimplified  trellis  for  coded  modulation  has 
usually  fewer  states  than  the  product  of  the 
number  of  modulator  states  and  the  number  of 
encoder  states.  The  states  of  the  unsimplified 
trellis  are  grouped  into,  so  called, 
superstates.  The  channel  in  our  paper  is 
assumed  to  be  an  ideal  Gaussian  one. 

First  of  all,  we  show  that  the  RSSE 
approach  is  applicable  to  the  convolutionally 
encoded  CPM  signals.  Then,  the  asymptotic 
error  performance  of  selected  coded  CPM 
schemes  is  estimated  by  means  of  equivalent 
Euclidean  distance  calculated  from  a 
simplified  trellis.  Numerical  results  are 
presented  for  TFM  and  MSK  signals  combined 
with  rate-1/2  short  constraint  length 
convolutional  codes.  The  results  are  also 
compared  with  computer  simulation.  The  concept 
of  matched  convolutional  encoding  combined 
with  the  simplified  reception  allowed  us  to 
find  schemes  which,  for  the  same  receiver 
complexity  and  bandwidth,  outperform  the 
schemes  found  so  far  by  values  of  up  to  1.2 
dB. 

Finally,  we  perform  a  code-receiver 
opimization  procedure  over  the  set  of  coded 
schemes  with  optimum  and  suboptimum  receivers 
that  should  lead  to  the  scheme  with  possibly 
lowest  error  probability,  regardless  of 
whether  the  receiver  turns  out  to  be  optimum 
or  not.  In  that  respect  we  introduce  the 
notion  of  the,  so  called  “optimum  transmitter" 
which  is  our  search  objecttive  instead  of  the 
traditionally  used  "optimum  receiver". 


For  example.  Fig.  2  shows  two  systems 
that  may  be  compared.  In  both  cases  the  number 
of  receiver  states  is  the  same.  System  B,  even 
though  employing  a  suboptimum  receiver,  would 
often  outperform  the  system  A  in  terms  of 
error  performance.  Hence,  under  these  new 
constraints,  the  optimization  moves,  to  a 
large  extent,  to  the  transmitter  side.  In  most 
cases  examined  by  us,  the  most  efficient 
solutions  were  found  not  to  be  based  on  the 
optimum  receiver  but  rather  on  the  RSSE 
receiver  with  the  reduction  factor  F=2 . 
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Fig.  I .  Block  diagram  of  convoluliooaDy  encoded  CPM  communication  system. 
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Fig  X.  Comparison  of  two  communicaiMB  systems 
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This  paper  presents  a  statistical  characterization  of  a  uniformly 
sampled,  first-order,  decision-directed  (DD),  digitally  implemented, 
phase-locked  loop  (DPLL)  for  MPSK  rnoduladons.  TTiis  architecture 
is  built  in  an  extremely  simple  fashion  and  has  near  ideal  coherent 
performance  at  moderate  to  high  SNR.  The  phase  detector  (PD) 
presented  in  this  paper  has  an  ideal  sawtooth  form  (ST-PD),  but  the 
analysis  is  easily  modified  to  obtain  results  for  the  generalized 
Costas  loops,  the  Mth  power  loop  [3],  or  other  loops  for  modulated 
signals.  Figure  1  shows  the  analytical  phase  domain  model  for  the 
decision-directed  loop  (note:  ()<  K  51  is  the  loop  gain). 


Figure  1.  Phase  domain  demodulator  model. 


A  discrete  time  Markov  chain  characterizes  the  DPLL  and  the 
associated  PD.  The  equation  describing  the  loop  operation  is 

=K(e„-0j  +  0„ 

-  /  («) 

=  h(0„.0„). 

Eq.  (1)  is  a  nonlinear  equation  since  the  phase  additions  and 
subtractions  are  modulo-2ii  operations.  Assuming  the  input  phase  is 

a  white  random  process  (Nyquist  prefiltering)  then  0„  is  a  first-order 
discrete-time  Markov  random  process.  The  Chapman-Kolmogorov 
equation  and  an  initial  distribution  function  are  sufficient  to  pr^uce 
a  complete  statistical  description  of  the  loop  operation. 

A  comparison  of  this  Markov  analysis  with  the  traditional 
diffusion  approximation  is  instructive.  Traditionally  the  analysis  of 
continuous  time  loops  assume  a  narrow  loop  bandwidth  and  claimed 
that  a  diffusion  approximation  is  valid  [2,  3].  This  discrete  time 
analysis  does  not  require  a  diffusion  approximation  and  provides 
some  advantages  in  analyzing  synchronization  systems.  The 
advantages  of  this  analytical  technique  are  that  no  approximations 
are  required  and  fast  time  variations  of  the  phase  error  (e.g.,  wide 
bandwidth  systems)  can  be  analyzed.  Table  1  summarizes  the 
differences  in  the  two  analytical  techniques. 


Ditfusion  Approximation 

Markov  Chain 

(Continuous  time  model 

Discrete  time  nrodel 

Fokker-PIanck  equation 

Chapman-Kolmogorov 

equation 

FP  coefficients  determined  by 
the  phase  detector 
characteristics 

State  transition  pdf 
determined  by  a 
transformation  of  random 
variables  on  the  input  noise 

Valid  for  small  loop 
bandwidths  in  comparison  to 
the  input  noise  bandwidth 

Valid  for  all  loop  bandwidths 

Valid  for  any  prefilter 

Input  must  be  delta-correlated 
oNyquist  prefilterin^ 

Not  valid  for  time-varying 
gain 

Can  examine  time-varying 
loop  gain 

Not  valid  for  looking  at 
symbol-to-symbol 
phase  error  dependencies 

Can  examine  symbol-to- 
symbol  phase  error 
dependencies 

Tabic  1.  Comparison  of  the  diffusion  approximation  and  the 
Markov  chain  model. 

Steady-state  performance  is  easily  characterized  with  traditional 
Markov  techniques.  The  chain  is  easily  shown  to  be  positive 
recurrent  and  asymptotically  ergodic.  An  eigen-decomposition  is 
used  to  evaluate  the  steady-state  density  function  and  the  re.sulting 
bit  error  probability. 

*  This  work  parUally  supported  National  Science  Foundation  under  Grant  NCR- 
91 15820 


Die  acquisition  performance  is  also  easily  characterized.  This  is 
accomplished  using  the  traditional  absorbing  boundary/state 
techniques  in  Markov  analysis.  As  expected,  as  K-»0  the 
performance  predicted  by  the  discrete  time  analysis  matches  that 
predicted  by  the  diffusion  approximation  14].  Figure  2  shows  the 
evolution  of  the  phase  error  process  during  acquisition  for  a  loop  for 
an  unmodulated  input  signal.  Note  that  t  is  the  time  normalized  by 
the  loop  bandwidth  and  the  initial  phase  error  was  set  to  be  ())p=180° 
which  corresponds  to  the  unstable  attractor  or  the  hangup  point  [1]. 
A  major  difference  between  the  DPLL  and  the  analog  PLL 
acquisition  performance  is  the  effect  of  the  hangup  anomaly. 


during  acquisition.  (po=180°.  K=0.25,  ST-PD,  and  SNRL=6dB. 

Finally,  the  cycle  slipping  performance  of  the  DPLL  based 
demodulator  is  examined.  Again,  absorbing  boundary  techniques 
and  some  well  known  Markov  process  results  (5]  permit  the 
characterization  of  the  moments  of  the  time  to  slip.  Figure  3 
presents  the  numerical  results  of  a  mean  time  to  slip  analysis  for 
QPSK  modulation. 


QPSK  modulation  versus  loop  SNR.  Sawtooth  PD. 
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On  the  Applicability  of  the  Fokker- Planck  Method 
in  Telecommunications 
(Summary) 


L.  Popken* 


In  the  theoretical  fields  of  telecommunications  it  can  be  observed 
that  all  too  often  the  Fokker-PLanck  (F-P)  method  is  applied  with¬ 
out  paying  (sufficient)  attention  to  the  physical  foundations  and  to  the 
conditions  for  the  method’s  actual  applicability. 

A  well-justified  system  walysis  approach  can  be  found  by  taking 
the  actual  equations,  that  describe  the  physical  system  in  question,  as 
a  starting  point  after  which  one  can  think  up  approximations.  Often 
researchers  in  technical  literature  attempt  to  formulate  these  approxi¬ 
mations  as  if  they  were  fundamental  system  descriptions.  This,  then, 
leads  to  approaches  like  the  F-P  equation  being  applied  to  processes 
that  do  not  really  satisfy  the  conditions  for  the  applicability  of  the  F-P 
method. 

Continuous  range  Markov  processes  [1]  contain  a  subclass  of  pro¬ 
cesses  which  in  terms  of  the  evolution  of  their  pdf  Px(z,  t)  are  described 
by  the  F-P  equation 


(i) 

with  two  functions  K2(z)  where  K2{x)  >  0.  This  subclass  of 

processes  is  defined  by  three  conditions  of  which  two  are  given  by 


0 


At 


lim 

0 


((Ax(t))») 

At 


(2) 

(3) 


where  z  is  the  value  of  ie(t)  at  any  time  t,  Ax(t)  =  x{t  +  At)  -  x(t);  the 
averages  (■  -  -)  are  taken  with  fixed  z(t).  The  third  condition  of 


lim 
ni— 0 


((Ax(t))>) 

At 


=  0  ,  j 


3,4,... 


was  later  replaced  by  the  Lindeberg  condition 


Prob.  {|x(t  -f  At)  -  z(t)|  >  «}  =  0(At)  (4) 

for  any  S  >  0,  [1]. 

The  validity  of  the  F-P  equation  is  equivalent  to  the  three  conditions 
(2)  to  (4)  being  true  for  the  process  i(t). 

A  major  question  arises  in  so  far  whether  Markov  processes  with 
continuous  sample  paths  actually  exist  in  reality.  For  physical  process¬ 
es  the  Lindeberg  condition  (4)  is  at  best  satisfied  only  approximately, 
[1].  Therefore,  the  second  order  differential  operator  in  the  F-P  equa¬ 
tion  is  not  a  mathematical  identity  but  an  approximation  only.  In  order 
to  justify  this  approximation,  a  systematic  expansion  is  required  which, 
except  for  special  cases,  shows  the  F-P  equation  to  be,  in  general,  in¬ 
consistent  because  it  includes  terms  of  the  order  of  magnitude  as  those 
terms  which  are  neglected  by  omitting  higher  order  derivatives.  Ad  hoc 
prescriptions  for  cutting  off  higher  moments  of  the  fluctuations  seem 
often  to  be  implied  by  (numerical)  needs  rather  than  by  logic. 

Markov  processes  with  continuous  sample  paths  do  exist  mathemat¬ 
ically  and  can  be  useful  in  describing  reality,  provided  that  underlying 
conditions  are  proven  to  be  adequately  satisfied  for  the  actual  system 
in  question. 

In  reality  there  is  no  such  thing  as  a  (continuous)  Markov  process.  How¬ 
ever,  there  may  be  driving  processes  with  memory  times  so  short  that, 
on  the  time  scale  of  interest,  it  is  appropriate  to  consider  the  system 
process  as  well  approximated  by  a  Markov  process.  In  this  case,  the 

European  Space  Research  and  Technology  Centre,  ESA/ESTEC,  RF  Syatesna 
Dieiaion,  XRT,  Keplerlaan  I,  P.O.  Bos  19>,  1100  AG  Naordwijk,  The  Netherlands. 


question  whether  the  sample  paths  are  continnous  or  discontinaoai  is 
not  relevant  anymore.  For  deriving  the  pdf  of  the  actual  process  the 
sanqtle  trajectories  of  the  approximating  Markov  process  are  certainly 
not  required  to  be  continuous,  although  the  physical  process  has  almost 
surely  continuous  sample  trajectories.  This  concept  of  actual  process 
verssu  Markov  process  has  been  developed  by  Stratonovich  in  [2]  which, 
however,  has  often  been  quite  severely  misinterpreted  also  in  telecom- 
nninications  literature. 

For  physicsd  processes,  Stratonovich  has  developed  the  pdf  p,  as 
solution  of  the  kinetic  equation 


where  the  individual  intensity  coefficients  K^(z)  must  be  develc^ed 
by  separate  expansions.  In  fact,  for  the  pdf  pi  a  systematic  two- 
dimensional  expansion  is  required,  i.e.  the  primary  expansion  wx.t. 
z  and  the  secondary  expansion  per  intensity  coefficient  K^(z)  which  is 
represented  by  its  terms  . .  .,t  =  1,2,3,...,  [3]. 

If  the  physical  process  which  drives  a  system,  has  a  correlation  time 
Tar  much  shorter  than  the  system  time  constant  ro,  i.e.  Tar  <  To,  and 
if  the  observation  time  interval  t  -  to  is  much  longer  than  ro,  then  the 
intensity  coefficients  K^(z)  can  be  approximated  by  their  correspond¬ 
ing  first  expansion  terms,  i.e.  K,(z)  »  A,,](z),  t  =  1,2,3,...;  the 
Ki(z),  t  =  2,3,...,  become  determined  by  the  correlation  functions  of 
the  actual  wide-band  driving  process.  This  procedure  is  formally  equiv¬ 
alent  to  the  case  of  a  mathematical,  white  driving  process  applying  a 
Markov  system  process.  Although  the  actual  system  process  has  almost 
surely  continuous  sample  trajectories,  it  can  in  terms  of  its  pdf  formal¬ 
ly  be  replaced,  in  general,  by  a  discontinuous  Markov  process,  [2]. 

It  is  the  fundamental  problem  in  several  publications  in  the  telecom¬ 
munications  area  that  Stratonovich's  work  [2]  is  misinterpreted  such  as 
Tar  <■  To  together  with  (t  -  to)  >  would  be  sufficient  conditions  tor 
replacing  the  actual  system  process  by  a  continuous  Markov  process  to 
which  then  the  F-P  equation  is  applied,  irrespective  of  the  higher  mder 
correlation  functions  of  the  (non-Gaussian)  noise. 

The  F-P  equation  (1)  can  provide  correct  results,  in  particular  if  it  is 
applied  to  linear  approximations;  in  these  cases  sre  restrict  the  system 
analysis  to  those  features  (such  as  low  order  moments  of  the  system 
process)  which  coincide  with  the  linear  noise  approximation.  Howeva, 
it  is  incorrect,  as  highlighted  in  [1],  to  consider  the  approach  seriously 
beyond  that,  for  instance  to  conclude  the  pdf 


P*(*) 


const 


Aj(z') 


(6) 


which  is  the  formal  solution  of  the  F-P  equation  (1). 
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THE  SYNCHRONIZATION  GAME 
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Abstract 

The  first,  although  restricted,  solution  of  the  Serial- 
Search  PN  Code  Acquisition  problem  in  presence  of 
Slowly  Time-Varying  Fading  Channels  is  herein  pre¬ 
sented,  providing  a  determinant  performance  mea¬ 
sure  for  Spread  Spectrum  Systems  under  these  con¬ 
ditions.  As  a  case  study,  a  white  noise,  average  power 
constrained,  random,  symmetric  two-state  Jammer  is 
analyzed,  and  the  corresponding  minimax  threshold 
is  determined. 

Summary 

The  solution  of  the  PN  Code  Acquisition  problem 
in  presence  of  Time-Varying  Fading  Channels  is  one 
of  the  most  important  open  problems  in  the  area  of 
Communications,  especially  in  light  of  the  present 
trend  towards  mobility. 

The  first,  although  restricted,  solution  of  the  Serial 
Search  PN  Code  Acquisition  problem  in  presence  of 
Slowly  Time-Varying  Fading  Channels  which  can  be 
discretely  approximated  is  herein  presented,  thus  pro¬ 
viding  a  much  needed  performance  measure  for  Spread 
Spectrum  Systems  under  these  taxing  conditions. 

Two  distinct  approaches  led  to  the  exact  solu¬ 
tion  of  the  approximated  (i.e.,  discretized)  problem. 
One  of  them,  original  and  making  full  use  of  symbolic 
computational  capabilities  now  available,  was  found 
to  be  particularly  advantageous  for  large  uncertainty 
regions  (i.e.,  large  number  of  states  in  the  state  tran¬ 
sition  diagram).  A  new,  quite  good  approximate  so¬ 
lution,  requiring  even  (much)  less  computations,  was 
also  found. 
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Tecnico,  1096  LISBOA  CODEX,  PORTUGAL. 


The  theory  is  then  applied  to  a  White  Noise  Two- 
State  Jammer  which  randomly  alternates  betwen  two 
equally  likely  jamming  levels  in  order  to  satisfy  an 
average  power  constraint.  In  this  case,  no  approxi¬ 
mation  is  involved,  since  the  channel  is  intrinsecally 
discrete. 

The  results  seem  to  indicate  that  at  high  SNRs, 
the  On-Off  Jammer,  alternating  periods  of  radio  si¬ 
lence  what  periods  of  total  jamming  (i.e.,  reserving 
all  the  available  power  for  the  jamming  ocasions),  is 
the  worst  possible  in  what  refers  to  acquisition.  For 
low  SNRs  a  more  elaborate  Jammer,  alternating  be¬ 
tween  the  On-Off  mode  and  the  Continuous  mode 
(i.e.,  jamming  at  the  average  power),  is  the  worst 
possible.  The  receiver,  by  appropriately  setting  its 
threshold,  can  now  play  the  minimax  synchroniza¬ 
tion  game. 
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Recently,  there  has  been  considerable  interest  in  mnlti-polse  pnlse  po¬ 
sition  modulation  (MPPM),  because  MPPM  reduces  the  transmission 
bandwidth  to  about  half  that  of  pulse  position  modulation  (PPM)  [1]. 
In  [2],  the  cutoff  rate  and  the  capacity  of  MPPM  in  noiseless  photon 
counting  channel  are  derived  and  MPPM  is  shown  to  outperform  PPM 
in  terms  of  both  cutoff  rate  and  capacity.  The  optical  channel  is  often 
modeled  by  noiseless  photon  counting  channel,  but  noise  doe  to  back¬ 
ground  and  detector  dark  currents  exists  on  the  practical  optical  channel. 
However,  the  performance  such  as  symbol  error  rate  and  bit  error  rate  of 
MPPM  in  noisy  photon  connting  channel  has  not  been  analy2ed  yet.  In 
this  paper,  we  analyze  the  performance  of  MPPM  in  noisy  photon  count¬ 
ing  channel  and  propose  interleaved  convolutional  coded  MPPM  system 
in  order  to  reduce  the  average  transmitter  power.  It  is  shown  that  the 
proposed  system  can  reduce  the  average  transmitter  power  compared 
with  uncoded  MPPM. 

In  MPPM,  the  laser  is  pulsed  in  p  slots  in  one  signal  block  consisting 
of  m  slots  with  duration  r,  and  •f=(^)  pulse  patterns  can  be  formed 
by  combining  the  positions  of  optical  pulses.  The  optical  channel  is  well 
modeled  by  Poisson  statistics,  under  which  the  output  of  the  channel  is 
a  doubly  stochastic  Poisson  process  with  intensity  A«(t)-f  A«  where  Af(t) 
is  the  mean  rate  in  photons  per  second  due  to  the  signal  impinging  on 
the  photodetector  and  A,  is  the  noise  intensity  due  to  background  and 
detector  dark  currents. 

First  we  derive  the  probability  of  symbol  error  of  MPPM  in  noisy 
photon  counting  channel.  The  probability  of  symbol  error  is  given  by 

P(t)  =  j'^P(cli)  (1) 

tai 

where  P(e/i)  is  the  probability  o( symbol  error  when  the  symbol  i  is  seat. 
Assuming  that  in  the  case  of  equal  symbol  counts  between  the  correct 
symbol  and  some  other  symbol  a  wrong  dedaion  is  made,  we  have 


P{‘/i)  <  Pr[(J{lV(  <  IVJ/.l  <  (J  Pr[N,  <  (2) 

}*i  }¥i 

where  the  second  inequality  is  justihed  by  the  union  bound.  And  we 
have 

Pr[Ni  <  Nj/i\  =  Pr[S,  <  iV,]  (3) 

where  and  <  Nj  are  independent  Poisson  random  variables  with 
means  dij{Xu  +  A«)r  and  dijX^r^  respectively,  and  dij  is  the  distance 
between  symbol  i  and  symbol  ji.  The  distance  is  defined  as  s  p— v 
where  v  is  the  number  of  overlapped  pulses  between  symbol  t  and  symbol 
j.  Jt  follows  that 

PrlNi  <  N,/{\  =  <?,(v'2doA.r,  y2d„(A.  +  A,).]  (4) 

where  Q\{a,0)  is  Marcum’s  (J-function.  Eq.  (4)  can  be  simplified  by 
using  a  ChernofT  bound  as 

PrlNi  <  Nj]  =  exp[-Ji,( v/(A»  +A,)r  -  \/A„r)’].  (5) 

Combining  Eqs.  (1),  (2),  (4)  and  (6),  we  obtain 

-  7  «Jtp{-di>(\/(A«  +  A,)r  -  >/A^)*).  (6) 

iel 

Defining  ^  as  the  number  of  signal  photons  per  information  nat,  we  have 


A,r  =  ^Rln(J)  where  R  is  the  code  rate.  Next  we  show  the  bit  error 
probability  of  MPPM  in  noisy  photon  counting  channel.  By  using  the 
probability  of  symbol  error,  the  bit  error  probability  is  approximately 
bounded  by  Pj  <  -  1)1P(«)  where  L  is  the  maximum  inte¬ 

ger  satisfying  L  <  log^  (").  Figure  1  shows  the  bit  error  probability  of 
MPPM  in  noisy  photon  counting  channel  as  a  function  of  signal  energy 
in  photons/nat.  The  probability  of  pulse  occurrence  is  selected  to  be 
same  among  all  schemes.  It  is  found  that  increasing  m  and  p  improves 
the  performance  of  MPPM.  Similar  trends  are  obtained  for  the  noiseless 
case  with  A,t=0.0,  because  MPPM  can  form  more  symbols  than  PPM 
at  the  same  probability  of  pulse  occurrence. 

In  order  to  reduce  the  average  transmitter  power,  we  propose  in¬ 
terleaved  convolutional  coded  MPPM  system  in  noisy  photon  counting 
channel.  Each  block  of  L  input  bits  is  fed  into  L  parallel  encoders.  The 
eitv^ded  bits  are  properly  interleaved  and  each  of  L  bits  is  sent  with  (m,p) 
MPPM.  On  the  decoding  side,  L  parallel  Viterbi  decoding  are  employed 
and  the  symbol  is  hard-demodulated  with  deciding  p  slots  in  older  of 
maximum  counts  in  each  frame.  In  this  case,  we  model  (m,  p)  MPPM 
channel  as  a  parallel  combination  of  binary  symmetric  cnannel  (BSC) 
with  transition  probabilities  q  and  1  -q  given  by  ^ 

In  binary  using  case,  2^  symbols  which  have  the  best  distance 


characteristics  are  selected  from  J  =  (*)  symb<^s.  Therefore  the 
transition  probability  q  is  bounded  by  the  above  equation.  Using  the 
union  bound  on  the  first-event  error  probability,  it  can  be  shown  that 
the  bit  error  probability  for  a  rate  A  =  1/n  convolutional  code  is 

bounded  hy  W\^*2(k)  where  is  the  number  of  bit 

errors  contributed  by  the  incorrect  paths  which  are  at  distance  h  from 
the  correct  path,  and  is  the  minimum  free  distance  of  the  code. 
For  the  BSC,  the  pairwise  error  probability  Paih)  is  upper  bounded  by 
Pa(h)  <  {21^(1  —  9)1'^^}^-  Figure  2  shows  the  bit  error  probability  of 
the  propo^  system  with  A  =  1/2  and  constraint  len^h  k  convolutional 
codes  where  f\  is  approximated  by  the  error-event  probability.  It  is  found 
that  the  proposed  svstem  can  greatly  reduce  the  averse  transmitter 
power  compared  with  uncoded  MPPM  in  noisy  and  noiseless  cases.  It 
is  also  found  that  the  system  with  larger  constraint  length  k  has  better 
performance  because  of  its  higher  error  correction  ability.  For  exaisple, 
the  proposed  system  with  k=7  reduces  signal  energy  m  to  achieve  A  = 
over  nncoded  MPPM  form  12.0  to  4.2  in  the  noisy  case  with  A.r  s 
1.0  and  from  8.3  to  2.3  in  the  noiseless  case  with  A,r  =£  0.0.  Ther^ore 
the  proposed  system  is  effective  to  reduce  the  average  transmitter  power. 
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Fig.2.  BER  for  interleaved  convolutional  coded  MPPM  as  a 
function  of  constraint  length  k  and  ^  pbotons/nat. 
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The  distribution  law  of  the  Brownian  Motion  exponential  func¬ 
tional; 

e  =  exp  [jV5^W,j  dl 

where  {W(,  t  g  IR'*'}  is  a  standard  Brownian  Motion,  which  mod¬ 
els  the  laser’s  phase  noise,  plays  a  key  role  in  many  of  the  recently 
proposed  heterodyne  lightwave  communication  systems.  The  exact 
derivation  of  these  statistics  appears,  however,  a  formidable  mathe¬ 
matical  task  [1],  Invoking  a  signal  flow  graph  formulation  and  com¬ 
binatorial  arguments,  leads  to  a  simple  and  computationally  efficient 
closed  form  formula  for  the  k’th  power  moment  Ee*  induced  by  the 
unknown  distribution  law.  These  results  are  useful  in  bounding  the 
performance  of  a  variety  of  lightwave  communications  systems  oper¬ 
ating  in  presence  of  phase  noise  [2],[3l. 

The  expression  for  Ee*"  is  given  in  terms  of  a 
((It  +  1)^  -  l/2(it  -f  \)k)-fotd  summation 


The  computational  complexity  of  c(m,  n)  is  of  an  exponential  order 
in  k  rather  than  a  factorial  order  characterizing  previously  reported 
methods.  We  address  also  the  efficient  derivation  of  Ee*'  given  the  set 

of  the  preceding  moments  ^  motivated  by  the  fact  that  power 

moment  statistical  characterization  often  requires  the  availability  of  a 
finite  set  of  consecutive  moments  |Ee*|^  ^  [2],(3].  It  is  shown  that 

Ec*  is  readily  given  as  the  convolution  of  the  preceding  moments  with 
a  set  of  known  casual  functions. 
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^  m=0  n=0 

where  c(m,  n)  is  a  rational  coefficient  given  by 


c{m,n)  = 


(«m  +  n„+i  -  n  -  1)' 


max(m.l)<r<fc 
00  =  1,  Or+J  =0 
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For  high  quantization  rates,  Lookabaugh  and  Gray  have  demonstrated  that 
the  advantages  of  optimum  vector  quantization  over  optimum  scalar  quan¬ 
tization  can  be  separated  into  the  space  filing  advantage,  the  shape  advan¬ 
tage,  and  the  memory  advantage  [1].  Eyubo^lu  and  Forney  have  proposed 
a  lattice-based  VQ  in  which  the  codebook  consists  of  all  the  lattice  points 
that  lie  inside  a  suitably  chosen  support  region  [2].  They  showed,  for  mem¬ 
oryless  sources,  there  are  two  significant  gains  —  the  boundary  gain  and 
the  granular  gain  —  that  lattice-based  vector  quantizers  realize  over  uni¬ 
form  scalar  quantizers.  For  memoryless  sources,  the  scalar-vector  quantizer 
of  Laroia  and  Farvardin  [3]  can  asymptotically  (in  block-length)  realize  the 
optimal  boundetry  gain.  It  however  realizes  no  granular  gain.  The  trellis 
coded  quantizer  of  Marcellin  and  Fischer  [4],  on  the  other  hand,  can  realize 
a  significant  portion  of  the  total  granular  gain,  but  makes  no  explicit  at¬ 
tempt  to  capitalize  on  the  boundary  gain.  Recently,  Laroia  and  Farvardin 
have  combined  the  above  two  ideas  and  proposed  a  fixed-rate  trellis-based 
scalar-vector  quantizer  (TB-SVQ)  [5],  which  realizes  both  boundary  and 
granular  gains. 

The  TB-SVQ  is  the  dual  of  optimally-shaped  trellis-coded  constella¬ 
tion  for  transmission  over  memory  less  channels  [6].  Laroia,  lYetter  and 
Farvardin  have  recently  proposed  a  precoding  scheme  [7]  that  solves  the 
problem  realizing  both  shaping  and  coding  gains  for  transmission  over  in¬ 
tersymbol  interference  channels.  In  this  paper,  we  combine  the  TB-SVQ 
idea  of  [5]  and  the  precoding  idea  of  [7]  to  develop  a  quantization  scheme 
for  correlated  sources. 

We  assume  the  source  {A'n}  is  the  output  of  a  linear  pth-order  au¬ 
toregressive  (AR)  filter  H{2)  driven  by  a  stationary  memoryless  innovation 
process  {W„}  where  H{z)  =  l/(l-m’-iPt2'*)-  We  propose  a  quantization 
scheme  whose  block  diagram  is  shown  in  Fig.  1.  This  quantizer  —  referred 
to  as  the  precoded  quantizer  (PQ),  motivated  by  the  aforementioned  preced¬ 
ing  scheme,  combines  the  precoder  (to  remove  the  source  correlation)  and 
the  TB-SVQ  (to  achieve  both  boundary  and  granular  gains). 

The  trellis  code  and  SVQ  share  the  common  underlying  scalar  alphabet 
Q  ~  ^  —3^9,  —0,  /?,  3/?,  •  ■  •}.  The  source  sequence  {Xn}  is  first  mapped 

to  the  coset  trellis  sequence  {An}«  which  serves  as  a  candidate  quantization 
sequence.  To  check  if  a  block  of  samples  of  (An)  is  inside  the  codebook, 
{An}  is  decorrelated  to  {Bn  *f  Qn]  where  {Bn}  is  a  valid  trellis  sequence 
(congruent  to  {  An  } )  ^nd  (Qn  }  is  some  noise  sequence  that  is  confined  to  the 
Voronoi  region  of  the  coset  lattice  of  the  trellis  code.  The  coset  quantizer 
removes  {Qn}  and  produces  {Bn}-  For  high-rate  quantization,  (An)  is 
a  good  approximation  to  {Xn}  and  the  energy  of  {Qn}  can  be  ignored, 
therefore  {B„}  is  a  good  approximation  to  {Wn}.  A  TB-SVQ  designed  for 
{Wn}  is  used  here  for  encoding  {Bn}-  The  SVQ  encoder  takes  a  block  of 
samples  from  {B„}  and  decides  if  the  vector  lies  inside  the  codebook  (defined 
in  the  innovation  domain).  If  it  does,  the  TB-SVQ  encoding  algorithm  is 
used  to  encode  the  vector.  If  not,  the  vector  {Xn}  is  gradually  moved 
closer  to  the  codebook  boundary  by  considering  the  geometric  shape  of 
the  boundary  induced  by  the  distribution  of  {W„}.  This  is  repeated  until 
the  corresponding  block  of  {Bn}  is  inside  the  codebook.  In  the  decoder, 
assuming  no  channel  errors  occur  in  the  channel,  the  output  of  the  TB- 
SVQ  decoder  is  {B„}.  The  precoding  scheme  of  [7]  is  used  here  to  generate 
the  quantization  sequence  (A„}. 

The  shape  of  the  codebook  is  defined  in  the  innovation  domain.  The  cor¬ 
responding  codebook  shape  in  the  source  domain  is  not  necessarily  optima). 
This  is  due  to  the  occurrence  of  the  overhead  noise  {Qn}  This  is  analogous 
to  the  increase  in  the  transmitted  power  for  the  precoding  scheme  described 
in  [7].  Reducing  the  energy  of  {Qn}  should  improve  the  performance  of  the 
PQ.  This  can  be  done  by  using  higher  dimensional  trellis  codes. 

The  performance  of  PQ  for  quantizing  Gauss-Markov  sources  was  ob¬ 
tained  via  simulations  using  100  sequences  of  32,000  samples  each.  The 
trellis  decoding  delay  is  100  samples.  Table  1  summarizes  the  perfor¬ 
mance  (SNR  in  dB)  of  the  PQ  for  encoding  an  AR(1)  Gauss-Markov  source 
(pi  =  0.9)  for  dimensions  32  and  64  at  rates  2,3  and  4  bils/sample.  As  ex¬ 
pected,  the  PQ  using  2D  trellis  code  performs  better  than  using  ID  trellis 


code.  A  two-codebook  [5]  64-dimen8ional  PQ  using  a  2D  trellis  code  at  ra 
3  bits/sample  yields  0.04  —  0.06  dB  performance  improvement.  This  illu 
trates  the  advantage  of  applying  codebooks  with  variable  density.  Compare 
to  the  predictive  trellis  coded  quantizer  (PTCQ)  [4],  for  the  same  numb 
of  trellis  states  (4,8,16  and  32),  the  64-dimensional  PQ  (with  100  sampl 
delay)  performs  0.5-1.5  dB  better  than  PTCQ  (with  1000  samples  dela; 
for  rate  3  bits/sample.  For  rate  2  bits/sample,  the  performances  are  aboi 
the  same. 


Figure  1:  Block  diagram  of  the  precoded  quantizer. 
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Table  1:  Performance  (SNR  in  dB)  of  the  PQ  for  an  AR(1)  Gausi 
Markov  source  (p\  =  0.9). 
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Suppose  we  wish  to  estimate  a  random  vector  Y  from  a  ran¬ 
dom  vector  X  with  an  estimator  h  (■)  that  is  constrained  to  take  on 
a  finite  set  of  N  values.  For  the  mean  squared  error  criterion,  the 
optimal  nonlinear  estimaUM'  /i(x)  is  given  by  the  cascade  of  the 
optimal  unconstrained  estimator  g(jc)  =  £[)'  I  X  =  jc]  followed 
by  the  optimal  vector  quantizer  for  the  random  vector  g(X).  Ute 
vectors  X  and  Y  may  have  different  dimensions.  We  view  h(x)  as 
a  generalized  vector  quantizer  which  optimally  generates  a  quan¬ 
tized  approximation  to  Y  from  observation  of  X .  The  special  case 
where  X  =  T  -i-  W  and  W  is  indqwndent  additive  noise  was  studied 
by  Wolf  and  Ziv  [1]  and  Ephraim  and  Gray  [2].  Sakrison  [3]  con¬ 
sidered  the  more  general  formulation  of  source  encoding  in  die 
presence  of  a  random  disturbance. 

Since  h(x)  has  finite  range,  its  domain  can  be  partitioned  into 
N  sets,  Sj,  each  the  pre-image  of  a  range  value,  where  N  is  die 
size  of  the  range  set.  It  is  readily  shown  that 

(a)  the  optimal  range  values  (y,- }  for  a  given  partition  are  given 

byy,  =£[g(X)IX€S,],  and 

(b)  the  optimal  partition  regions  given  the  range  values  are; 

S,  =  {X  ;  llg(x)-y,  II  ^  llg(x)-yy  II  for  all  j]  ignoring 

boundary  values. 

In  practice,  design  of  h{x)  can  be  based  on  a  large  set  of  empirical 
data  pairs  (X ,  K)  as  a  statistical  specification  of  the  random  vec¬ 
tors.  In  general,  the  domain  regions  5,-  are  neither  convex  nor  con¬ 
nected  sets.  Thus,  conventional  vector  quantizer  design  methods 
are  inadequate.  The  t^timal  h{x)  must  therefore  be  implemented 
as  a  pattern  classifier  (an  encoder)  that  maps  the  input  X  to  an 
index  i  followed  by  a  decoder,  a  table-lookup  operation  with  i  as 
input. 

The  above  formulation  and  resulting  design  methodology 
offers  a  notable  improvement  to  a  useful  paradigm  in  vector  quant¬ 
ization  (VQ),  called  nonlinear  interpolative  vector  quantization 
(NLFVQ).  The  basic  theory  of  NLTVQ  was  introduced  in  [4]  and 
has  found  several  tq)plications,  including  multiresolution  image 
compression  [S],  multispectral  image  compression  [6],  nonlinear 
prediction  of  speech  [7],  wideband  audio  compression  [8],  and 
enhanced  decoding  of  standard  transform  coded  images  [9].  In 
NLIVQ,  a  signal  vector  Y  of  dimension  m  is  mapped  by  a  feature 
extractor  into  a  vector  X  of  dimension  k  (usually  k  i  m)  which  is 
then  VQ  encoded,  producing  an  index  (channel  symbol)  /;  unlike 
ordinary  VQ,  the  decoder  directly  reconstructs  an  approximation  to 

fThit  work  wm  nipponad  by  the  NMkiiMl  Setonoe  FoutMiMkin,  ttw  UC  Micro  Ptoffim.  Rockwell 
ImmutioiMl.  HugtMi  Aircraft.  EaoCiiMn  Kodak.  Conipwrion  Labe,  and  FujiMu  Laba. 


Y  (rathn-  than  to  X )  by  a  table  lookup  with  a  codebotric  of  dimen¬ 
sion  m.  In  the  special  case  where  X  is  a  subsampled  version  of  Y, 
the  decoder  can  perform  optimal  interpolation  of  Y  from  a  digital 
representation  of  X.  Until  now,  NLTVQ  was  based  on  an  optimal 
decoder  for  a  given  VQ  encoder.  Here  we  see  that  NLTVQ  can  be 
improved  by  jointly  optimizing  the  encoder  (which  digitizes  X )  and 
the  decoder.  This  follows  as  an  immediate  application  of  the  prob¬ 
lem  of  optimal  nonlinear  estimation  with  finite  range,  posed  in  the 
first  paragraph  above.  The  performance  achievable  with  NLTVQ  is 
thereby  improved  and  the  applicability  and  utility  of  NLTVQ  is 
correspondingly  enhanced. 
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1.  INTRODUCTION 

Vector  Quantization  (VQ)  is  an  important  method  for  source  coding 
and  signal  compression  [I],  VQ  is,  however,  not  without  problems,  e.g. 
robustness  and  complexity.  Structured  VQ  [2]  and  channel  optimized 
VQ  [3]  are  areas  of  current  research,  aimed  to  overcome  some  of  these 
problems.  In  this  paper,  we  introduce  a  new  flexible  concept  of  VQ  that 
is  robust  both  in  terms  of  training  databases  and  channel  errors.  The 
method  can  also  be  used  to  achieve  reduced  storage  requirements  and 
search  complexity. 

2.  THE  VECTOR  QUANTIZER 

Consider  a  VQ  constrained  by  the  following  linear  mapping 

c  =  Tb  +  m  (1) 

where  c  is  the  reconstruction  vector  of  dimension  D  generated  by  a 
code-vector  b.  Let  the  code-vector  stem  from  a  binary  systematic  block- 
code 


•»  =  (' . . . (2) 

where  the  k  binary  elements  i^  are  information  bits  and  the  the  f  s  are  r 
redundant  modulo-2  sums  (parity  bits)  of  the  information  bits.  Thus,  b 
is  a  code-vector  in  a  (k  +  r,k)  linear  block-code  (4).  The  2*  different 
code-vectors  generate  an  equal  number  of  reconstruction  vectors.  We 
emphasize  that  only  the  information  bits  have  to  be  transmitted  in  an 
application,  the  redundant  bits  are  introduced  to  control  the  degrees  of 
freedom  in  the  mapping. 

To  achieve  a  suitable  representation  for  signal-vector  generation 
purposes,  binary  0  is  represented  by  +1  and  binary  I  is  represented  by 
-I  in  b.  Hence,  the  points  b  constitute  the  ordered  subset  of  the  coiners 
of  a  k+r  dimensional  cube,  as  specified  by  the  block-code.  The 
projection  matrix  T  is  of  dimension  Ox(k  +  r)  and  the  D-dimensional 
vector  m  represents  the  mean-value  of  the  source.  For  a  given  source  to 
quantize,  both  the  projection,  T,  and  the  block-code  must  be  selected. 
For  a  given  block-code,  an  optimization  involves  the  Dx{k  +  r+\) 
parameters  of  T  and  m. 

In  order  to  compute  an  optimized  mapping  for  an  arbitrary  .source, 
we  use  an  iterative  training  technique  with  a  database  of  representative 
source  vectors,  x.  We  adopt  the  squared  Euclidean  distance  as  distortion 
measure  and  restrict  the  discussion  to  zero-mean  sources  (m-0).  The 
measure  to  be  minimized  is  then 

«T  =  E[||x-T.b(x)|f]  (3) 

where  b(x)  denotes  the  code-vector  used  to  generate  the  reconstruction 
vector  of  a  certain  source  vector  x.  Minimizing  this  with  respect  to  T 
gives  an  expression  for  the  rows  t^  of  T 

E[b(x)  b^(x)]  t,  =  e[x,  Wx)]  y=l . D  (4) 

where  .r,  is  component  J  of  x.  We  are  now  able  to  devise  an  block- 
iterative  algorithm  for  computation  of  the  mapping: 

(i)  Initialize  the  matrix  T. 

(ii)  Find  the  nearest  reconstruction  vector  for  each  vector  x  of  a 
training  database  (i.e.  b(x)).  Compute  the  correlations  in  eq.  (4) 
for  each  row  J. 

(iii)  Solve  eq.  (4)  for  the  D  rows  of  T.  Evaluate  the  distortion  a^. 

(iv)  Repeat  from  (ii)  until  end  of  training. 


(U)  Robustness  against  channel-errors. 

Eq.  ( 1 )  can  be  expressed  as 

*  r 

(5) 

where  is  column  j  of  T.  We  define  the  weights  of  the  bits  as  the 
length  of  the  corresponding  vector  in  (S).  Robustness  against 
channel-errors  requires  that  neighboring  reconstruction  vectors  c  have 
small  Hamming  distances  in  the  information  part  of  the  corresponding 
code-vectors  b.  A  second  major  result  in  this  paper  is  that  good 
neighboring  properties  in  the  VQ  are  obtained  by  assuring  smalt  weights 
in  the  redundant  part  of  the  code.  The  information  part  shall  have  high 
and  fairly  uniform  weights.  The  result  is  a  VQ  with  inherent  robustness 
against  channel  errors. 

(Hi)  Fast  and  Robust  training. 

The  initialization  of  the  matrix  T  is  an  important  issue  for 
convergence  of  iterations  and  channel-error  robustness.  We  have  devised 
several  efficient  techniques  for  initialization  based  on  assigning  low 
weights  to  the  redundant  bits.  Note,  moreover,  that  during  the  iterations, 
the  proposed  algorithm  takes  second-order  effects  into  account  in  each 
adjustment  of  the  VQ  by  solving  for  eq.  (4).  By  using  few  redundant 
bits,  the  number  of  parameters  to  adjust  is  tow  and.  hence,  robustness 
against  variability  in  training  databases  is  ensured. 


4.  RESULTS 

Figure  I  below  shows  reconstruction  vectors  with  associated  Voronoi- 
regions  and  a  plot  with  vectors  at  Hamming-distance  one  connected  for 
two  2-dimensional  cases  designed  by  the  proposed  method. 


Voronoi  Regions  Hamming- 1  neighbors  Voronoi  Regions  Hamming- 1  neighbors 


.t  information  bits  and  I  parity  bit  4  information  bits  and  4  parity  bits. 

Figure  }.  Illustration  of  results  for  a  2-dimensional  memoryi.ss  Gaussian 
.source.  Left  part:  Linear  mapping  of  u  I4..i)  block-code.  Right  part:  Linear 
mapping  of  a  <S.4)  block-code. 


Table  I  and  2  below  illustrate  some  results,  in  terms  of  Signal  to- 
Nuise  ratios,  obtained  for  a  memoryless  Gaussian  source  and  a  Gauss- 
Markov  source  with  correlation  0.5. 


Table  1.  Memoryless  Gauss  source.  Table  2.  Gauss-Markov  source  0.5. 
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The  results  are  given  In  dB  and  are  obtained  as  the  mean-value  of 


evaluation  over  3  independent  data-bases,  each  of  size  500  000  vectors. 


3.  PROPERTIES 

This  way  of  representing  a  Vector  Quantizer  has  a  number  of 
desirable  properties. 

(i)  Few  parity  bits  are  needed. 

By  using  every  available  parity  bit,  r  =  2*  -  k  -  I ,  there  are  Dx2*  free 
parameters  in  T  and  m,  i.e.  we  are  able  to  generate  an  arbitrary  set  of 
reconstruction  vectors.  This  gives  an  unconstrained  VQ.  For  a  given 
application,  we  can  choose  any  number  of  parity  bits  between  this 
maximum  and  zero.  A  main  result  of  this  paper  is  that  by  using  only  a 
few  parity  hits,  or  even  none,  one  obtains  a  robust  result  close  to  the 
unconstrained  case. 
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Tree-structured  vector  quantization  (TSVQ)  is  an  im¬ 
portant  alternative  to  full-search  vector  quantization  since 
it  reduces  the  encoding  search  complexity  from  0{N)  to 
0{log{N)),  where  N  is  the  decoder  codebook  size,  while  in¬ 
curring  some  additional  distortion.  Typically,  the  methods 
for  designing  trees  are  greedy,  growing  the  trees  outward  from 
the  root,  node-by-node,  by  minimizing  a  suitable  cost  func¬ 
tion  at  each  node.  The  optimization  at  each  non-leaf  node 
does  not,  in  general,  reflect  the  node’s  eventual  role  as  a  dis¬ 
criminant  function,  partitioning  the  input  space  in  order  to 
minimize  the  distortion  incurred  at  the  descendent  leaves.  In 
the  splitting  algorithm  [1],  for  example,  the  test  vectors  at  a 
non-leaf  node  are  chosen  to  minimize  the  distortion  incurred 
at  the  given  node,  instead  of  minimizing  the  distortion  at  the 
leaves.  The  resulting  tree  may  be  improved  simply  by  mod¬ 
ifying  individual  nodes  of  the  tree  in  order  to  achieve  better 
agreement  with  nearest  neighbor  classification  at  the  leaves. 

As  an  example,  consider  the  TSVQ  solutions  of  Fig¬ 
ure  1.  The  sub-optimal  solution  of  Figure  la  was  achieved 


by  the  splitting  algorithm,  despite  a  split  at  the  highest  level 
which  minimized  the  node’s  distortion.  To  improve  the  split¬ 
ting  solution,  given  fixed  leaves,  the  highest  level  boundary 
should  be  chosen  to  minimize  the  distortion  at  the  leaves. 
Finding  the  optimal  boundary  is  equivalent  to  obtaining  a 
minimum  risk  linear  di.scriminant.  Several  methods  from  the 
pattern  recognition  field  are  applicable  [2].  For  the  example 
of  Figure  1,  it  suffices  to  choose  the  boundary  in  order  to 
improve  the  agreement  with  nearest  neighbor  ownership  at 
the  leaves.  This  can  be  accomplished  by  choosing  the  highest 
level  representatives  to  be  the  centroids  of  data  “owned”  in  a 
nearest  neighbor  sense  by  the  representatives’s  descendants 
at  the  leaves.  Then,  recalculating  the  leaf  centroids  yields 
the  global  minimum  solution  of  Figure  lb. 

"Supported  by  the  Engineering  Fondation  with  the  cooperation  of 
IEEE,  grant  RI-A-92-12 


To  generalize  the  method  to  trees  of  any  depth,  at  each 
step,  we  fix  2dl  nodes  of  the  tree  but  one  and  seek  to  mini¬ 
mize  the  risk  associated  with  this  node.  Given  the  updated 
node,  we  can  then  optimize  its  descendant  leaf  codevectors 
as  the  centroids  of  their  respective  partitions.  This  process  is 
repeated  over  all  nodes  of  the  tree.  In  principle,  if  the  risk  is 
minimized  at  each  node,  the  node  updates  are  non-increasing 
in  the  distortion  of  the  tree  and  the  method  can  be  iterated 
until  convergence.  In  our  simulations,  we  applied  the  crude 
approximation  of  updating  node  representatives  as  centroids 
of  the  data  “owned”  in  a  nearest  neighbor  sense  by  their  de¬ 
scendants  at  the  leaves.  This  low-complexity  version  of  our 
method  does  not  guarantee  a  descent,  though  in  our  simula¬ 
tions  it  does  improve  upon  the  splitting  .solution.  We  tested 
Gaussian  and  Gauss- Markov  sources  with  vector  dimensions 
from  four  to  eight  and  tree  depths  from  five  to  ten.  Typi¬ 
cally.  our  method  gained  20-30  percent  of  the  performance 
gap  between  TSVQ  via  splitting  and  full-search  V'Q  via  the 
LBG  method.  We  expect  that  more  improvement  should  be 
possible  with  the  use  of  a  better  linear  discriminant. 

A  shortcoming  of  the  approach  described  above  is  the 
dependence  on  the  tree  initialization.  For  complex  data  dis¬ 
tributions,  the  problem  of  local  minimum  traps  can  become 
severe,  and  prompts  us  to  seek  a  method  which  is  insensi¬ 
tive  to  initialization,  and  which  can  avoid  some  local  min¬ 
ima,  Motivated  by  the  deterministic  annealing  method  for 
unstructured  vector  quantization  and  clustering  [4],  we  have 
derived  a  related  approach  for  the  hierarchically  structured 
clustering  problem.  For  the  structured  problem,  we  view 
the  hierarchical  partitioning  requirement  as  prior  knowledge, 
and.  accordingly,  invoke  the  principal  of  minimum  cross  en¬ 
tropy.  The  resulting  method  has  been  tested  on  challenging 
problems  involving  normal  mixtures,  and  has  been  found  to 
obtain  significant  improvement  over  both  the  splitting  algo¬ 
rithm.  and.  for  some  examples,  the  K-means  chistering  algo¬ 
rithm, 
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Abstract 

We  address  progressive  transmission  of  full  search  im¬ 
age  vector  quantization.  We  build  a  progressive  trans¬ 
mission  tree  to  define  binary  mergings  of  codewords  for 
successively  smaller  sized  codebooks.  The  tree  design 
methods  we  apply  are  the  generalized  Lloyd  algorithm 
splitting  algorithm,  minimum  cost  perfect  matching,  and 
a  method  of  principal  eigenvectors. 

Vector  quantization  (VQ)  [1]  is  a  lossy  compression  technique 
that  has  been  used  extensively  for  image  compression.  Progres¬ 
sive  image  transmission  allows  an  image  being  transmitted  to 
be  recognized  early  at  the  receiver;  this  saves  bandwidth  if  the 
wrong  image  is  being  sent.  We  present  three  new  methods  for 
the  selection  of  codeword  indexes  which  allow  for  direct  progres¬ 
sive  transmission  of  images  compressed  with  full  search  VQ.  In 
all  cases,  we  fit  a  tree  of  intermediate  codewords  to  a  full  search 
VQ  codebook  and  use  the  tree  indexes  as  the  codeword  indexes. 

A  full  search  progressive  transmission  tree  allows  full  search 
VQ  to  be  sent  progressively.  It  is  a  balanced  tree  whose  terminal 
nodes  or  leaves  are  labeled  by  codewords  generated  by  a  code¬ 
book  design  technique  and  whose  internal  nodes  are  labeled  by 
intermediate  codewords  derived  from  the  leaf  codewords.  The 
tree  is  used  to  reassign  the  original  indexes  of  the  leaf  codewords 
to  new  indexes  that  are  compatible  with  progressive  transmis¬ 
sion.  With  each  bit,  the  receiver  displays  the  intermediate  code¬ 
word  located  at  the  internal  node  being  visited  in  the  tree. 

We  use  region-merging  to  build  the  progressive  transmis¬ 
sion  tree  and  determine  the  intermediate  codewords.  A  region¬ 
merging  tree  is  formed  by  merging  Voronoi  regions  of  the  original 
codebook  in  pairs  to  form  larger  encoding  regions. 

Ordered  VQ  codebooks  provide  a  simple  method  to  build¬ 
ing  the  region-merging  tree.  In  an  ordered  VQ  codebook,  the 
codewords  with  neighboring  indexes  are  also  neighbors  in  the  in¬ 
put  space.  The  region-merging  tree  is  built  by  simply  merging 
together  regions  with  neighboring  codeword  indexes.  We  found 
that  the  generalized  Lloyd  splitting  algorithm  (GLA)  gives  code¬ 
books  that  are  reasonably  well  ordered. 

Another  method  of  forming  a  region -merging  tree  is  minimum 
cost  perfect  matching  (MCPM)  from  optimization  theory  [2].  In 
MCPM  we  have  a  complete  graph  of  nodes  and  a  cost  associated 
with  matching  each  different  pair  of  nodes.  The  cost  of  the 
overall  matching  is  the  total  sum  of  the  costs  of  matching  each 
graph  node  pair.  To  construct  the  region-merging  tree,  we  choose 
the  graph  nodes  to  be  the  Voronoi  regions  defined  by  the  original 
codebook  and  the  cost  to  be  the  increase  in  distortion  due  to 
merging  two  Voronoi  regions  together.  The  tree  is  built  from  the 
bottom  up  by  repeatedly  solving  the  MCPM  problem.  Running 
MCPM  to  find  the  matching  with  minimum  cost  assures  that  the 
increase  in  distortion  at  the  next  level  of  the  tree  is  minimized. 

For  progressive  transmission,  the  image  quality  at  lower  rates 
is  more  important  than  at  high  rates.  Unlike  MCPM  which  is 


Figure  1;  The  normalized  MSE  distortion  at  each  bit  rate  for 
intermediate  codebooks 

a  bottom-up  method,  our  third  approach  is  a  top-down  method 
which  seeks  to  minimize  the  distortion  at  lower  bit  rates.  In  this 
case,  we  start  with  the  centroid  of  the  codebook  and  successively 
divide  the  codewords  in  half.  The  problem  reduces  to  find  an 
optimal  partition  to  separate  the  size  N  codebook  into  two  size 
y  subcodebooks  to  maximize  the  decreased  distprtion  between 
the  centroid  of  the  size  A'  codebook  and  the  two  centroids  of  the 
size  y  codebooks.  Unfortunately,  there  are  C(Af,  y)  possible 
ways  to  choose  the  partition  and  solving  this  problem  becomes 
impractical  for  moderate  N.  Our  heuristic  pl^«:es  a  hyperplane 
perpendicular  to  the  principal  eigenvector  of  the  training  set  to 
divide  the  codewords  in  half.  Some  codewords  near  the  hyper¬ 
plane  are  exchanged  to  maximize  the  decrezised  distortion.  The 
tree  is  built  from  the  top  down  by  repeatedly  solving  this  opti¬ 
mization  problem. 

Figure  1  is  the  normalized  MSE  at  each  bit  rate  for  intermedi¬ 
ate  codebooks  for  the  test  medical  images.  The  GLA  followed  by 
one-step  optimal  MCPM  (GLA/MCPM)  slightly  outperformed 
the  GLA  at  most  bit  rates  with  a  maximum  improvement  of  0.44 
dB.  We  feel  that  the  simplicity  of  the  GL.A  and  the  only  slight 
difference  in  performance  makes  the  GLA  more  attractive  than 
GLA/MCPM.  In  the  same  Figure,  GLA/EIGEN  which  repre¬ 
sents  the  principal  eigenvector  method,  outperforms  even  TSV'Q 
at  most  bit  rates  and  of  course  gives  lower  distortion  at  the  final 
bit  rate. 
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Abstract:  Generalized  theta  functions  of  lattices  for 
metrics  of  type  are  introduced  and  computed  when 
a  coding-theoretic  construction  of  type  A,B,  or  C  of  the 
lattice  used  for  VQ  exists.  Upper  bounds  of  the  saddle 
point  type  and  geometric  lower  bounds  on  certain  sums  of 
their  coefficients  are  derived  and  applied  to  the  estimation 
of  the  size  of  the  codebook  consisting  of  all  points  of  the 
lattice  within  a  sphere  of  given  radius  for  the  concerned 
metric. 
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1  Problem  and  Motivation 

When  quantifying  a  multidimensional  source  with  given 
pdf,  the  asymptotic  equipartition  principle  of  informa¬ 
tion  theory  tells  us  that  the  samples  of  the  source  will 
concentrate  on  or  about  the  equiprobable  surface.  In  the 
case  of  a  Gaussian  law,  these  surfaces  are  usual  euclidean 
spheres.  If  a  lattice  A  is  used  for  quantizing  the  source 
with  a  sphere-shaped  codebook  then  estimating  the  num¬ 
ber  of  points  inside  the  spliere  is  essential  for  determining 
the  transmission  rate. 

We  consider  the  case  of  equiprobable  surfaces  that  are 
spheres  for  the  W  metric  of  the  type 

5p(n,m)  =  {r  €  R"ll|i'l|p  <  m}, 

where  ||i:||J  =  When  the  source  is  Laplacian 

p  =  1,  and  5i(n,m)  is  a  so-called  pyramid  (or  hyperoc- 
tahedra).  This  case  has  practical  applications  in  image 
processing  [4]. 


2  High  Rate  Approximation 

Gauss'  counting  principle  says  tlial  the  number  of  points 
with  integer  coordinates  in  a  convex  body  i.s  well  ap¬ 
proximated  by  its  volume.  This  was  proved  to  fail  for 
S2{n,y/^)  and  large  n  in  [2].  If,  however,  n,p  are  fixed 
anu  m  is  large  this  is  a  mere  application  of  Riemann  sunis. 
If  a  lattice  A  is  used  witli  fundamental  volume{=volume 
of  its  Voronoi  cell)  to/(A)  this  says  that 


vol{Sp{n,m)) 

vo}{A) 


(1) 


Explicit  formulas  for  vol{Sp[7i,m))  can  be  found  in  [3). 


3  Generalized  Theta  Function 

Let 

*s.a(?)  =  (2) 

»€A 

When  p  =  2  (2)  is  just  the  classical  theta  function  [1]  and 
when  p  =  1  the  Nu  function  of  [4,  5).  When  A  is  obtained 
from  a  binary  code  C  by  construction  A,  we  have 

®p.a(9)  =  M^c(®p.2z{9).^p.2Z+i(fl)),  (3) 

where  Wc  is  the  weight  enumerator  of  C. 


4  Saddle  Point  Approximation 

Let  g(s)  =  Sp  for  s  >  0. 

Theorem  1 

|AQSp(n.m)|  <  s(«o)c'“'”’ 
where  sq  is  the  unique  nonegative  real  solution  of 
m^g{s)  =  e~*g‘(s). 

If  the  saddle  point  approximation  applies  then  this  bound 
is  ©(RHS  of  ( J)). 

5  Voronoi  Covering  Bound 

Partitioning  the  sphere  Sp{n,rn)  into  Voronoi  domains  of 
the  Lattice  points  it  contains  we  see  that  RHS  of  (I)  is 
always  a  lower  bound  for  every  m. 
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Introduction 


Assume  a  dimension  n  random  vector  source  X-  A  rate  r  vector  quan¬ 
tiser  is  a  mapping  of  X  onto  one  of  2"'  =  N  representation  vectors. 
The  VQ  is  defined  by  these  reconstruction  codevectors,  i,,  and  th«r 
associated  quantization  regions,  Qj.  Consider  the  performance  mea¬ 
sure  of  mean  squared  error  per  dim<'nsion.  For  minimum  MSE  each 
codevector  should  be  the  centrmd  of  its  corresponding  region.  Fur¬ 
ther,  if  the  codevectors  are  the  centroids,  then  the  MSE  expression 
reduces  to  the  difference  between  the  input  and  output  variances.  For 
the  discussions  below  assume  that  X  has  independent  elements  whose 
marginal  density  functions  are  unit  variance,  symmetric,  and  have 
equal  first  absolute  moment  =  7- 

Some  recent  work  [4,  5]  constructed  VQ  codebooks  from  the  code¬ 
words  of  binary  linear  block  codes.  Rather  than  a  full-search  im- 
plemeutation  we  considered  a  syndrome-based  mapping.  Specifically, 
taking  the  dual  of  antipodal  modulation  and  syndrome  decoding  of  an 
(n,  k)  code,  we  considered  the  following  rate  t/n  VQ  implementation: 
hard  quantize  each  element  of  x  to  one  bit  (with  threshold  zero);  com¬ 
pute  the  syndrome  of  the  resulting  binary  sequence  and  “correct"  the 
error;  use  the  fc  information  bits  of  this  codeword  as  the  VQ  index; 
reconstruct  x  based  upon  the  k  information  bits.  With  this  format 
each  quantization  region  was  the  union  of  2""*  orthants;  the  VQ  re¬ 
construction  vector  was  the  centroid  of  its  quantization  region.  For  the 
assumed  source  the  complete  symmetry  of  the  problem  resulted  in  all 
regions  being  identical;  hence,  solving  for  Qi  and  xt  (corresponding  to 
the  all  zero  codeword)  was  sufficient  to  describe  performance.  Letting 
e/  =  [e/,1, •■.,«/,«],  I  =  1,2,. ..2""*  =  M,  be  the  coset  leaders  of  the 
code,  the  n  elements  of  and  the  resulting  MSE  were 


For  comparison,  time-shared  scalar  quantization  (zero  or  one  bit  quan¬ 
tization  per  element)  has  MSE  =  1  -  r7’. 


Nonlinear  Codes 


An  (n,  7V,d]  nonlinear  code  consists  of  N  codewords  of  length  n  with 
minimum  Hamming  spacing  d.  Being  nonlinear,  the  simplicity  of  syn¬ 
drome  decoding  and  the  complete  symmetry  of  the  Qj  are  lost.  Frbm 
the  perspective  of  examining  a  nonlinear  code  as  a  possible  VQ  code¬ 
book,  this  forces  us  to  describe  each  region  Qj  individually,  comput¬ 
ing  its  probability  of  occurence  and  centroid.  Let  Cj  be  a  typical 
codeword  and  d/f(-,-)  be  Hamming  distance.  Implementing  the  VQ 
by  hard  quantizing  the  input  and  minimizing  the  Hamming  distance 
to  a  codeword,  min,  d//(cj,i(l -f  8gn(x))),  then  each  Qj  is  again  the 
union  of  orthants  Qj  =  Qjj  where  the  number  of  orthants,  Mj 
for  the  j'-th  region  (Hjli  Af,  =  2"),  depends  upon  the  detail  of  the 
encoder  implementation.  Paralleling  the  analysis  above  for  the  linear 
code  case,  the  t-th  coordinate  of  the  centroid  of  the  j-th  region  and 
the  overall  MSE  are 


X 


7 

Mj 


MSE 


=  i_iy 


where  Oj,ij  depends  upon  the  orientation  of  the  t-th  coordinate  of  the 
/-th  orthant  of  Qj  with  respect  to  the  i-th  element  of  Cj  (unity  if  they 
match,  zero  i,"  Lh^y  4oi>’*V 


As  an  example,  consider  the  Nordstrom-Robinson  |1S,2S6,S]  code 
[2j.  An  algebraic  decoding  algorithm  [3]  results  in  all  regions  having 
the  same  probability  of  occurence  (i.e.  each  U,  =  128).  The  resulting 
performance  is  MSE  =  1  -  O.&SI7*,  slightly  better  than  scalar  quanti¬ 
zation  (MSE  =  1  -  O.S337’).  Although  one  could  consider  variations 
of  this  code  (extending  to  dimension  16  or  shortening  to  14,  13,  and 
12)  the  original  dimension  IS  versian  has  best  MSE  performance. 


A  Negacyclic  Code  Example 

The  first  step  in  implementing  the  syndrome-bated  VQt  consists  of 
mapping  the  source  vector  onto  a  discrete  sequence  suitable  for  al¬ 
gebraic  decoding.  Our  approach  in  the  binary  code  case  was  to  pair 
each  element  of  the  source  vector  with  a  separate  codeword  position, 
thresholding  the  source  value  (at  zero)  to  produce  a  binary  value.  Thu 
worked  since  Hamming  distance  and  Euclidean  distance  are  directly 
related  for  binary  sequences.  To  extend  to  q-ary  codes  (q  >  2),  two 
mappings  which  come  to  mind  are  scalar  quantization  of  each  source 
value  to  q  levels  (as  in  PAM)  and  partitioning  of  the  bivariate  plane 
into  q  equi-angular  regions  (as  in  PSK).  Although  Hamming  distance 
does  not  directly  reflect  the  Euclidean  relationship  of  points  is  such 
constellations,  the  Lee  metric  does.  Since  the  match  is  better  for  the 
PSK  constellation,  we  pnrsue  that  mapping  below. 

Our  example  assumes  an  iid  Gaussian  source  and  the  q  s  5  (123) 
negacyclic  code  from  [1,  p.209]  (r  =  (log, 5*)/24  =  0.774  bits/dim). 
To  map  the  source  onto  the  5’’  5-ary  sequences,  we  take  vectors 
of  length  24,  breaking  them  into  12  pairs.  Each  pair  uniquely  de¬ 
fines  a  phase  angle  in  polar  coordinates;  uniform  qnantizatioii  of  the 
angle  specifies  the  5-ary  symbol.  For  syndrome  decoding  there  are 
5^  =  625  cosets  which  include  the  no  error  pattern,  the  24  distance 
one  errors  (a  single  bl),  the  288  distance  two  errors  (a  single  ±2 
or  two  ±ls),  and  313  cosets  corresponding  to  Lee  metric  equal  to 
3.  For  minimum  MSE  encoding  we  choose  these  remaining  error 
vectors  of  the  form  ±l,i;l,±l.  In  the  centroid  computatioa,  each 
quantization  region  is  the  union  of  625  dimension  24  cones.  Complet¬ 
ing  the  computation  (again,  noting  the  symmetry  of  the  problem), 
the  centroid  of  Qi,  corresponding  to  the  all-zero  codeword,  is  X|  = 
[447a, 0, 507a, 0, 543a, 0,536a, 0, 544a, 0, 547a,0, 548a, 0, 545u, 0551a,  0, 
551a, 0,558a, 0,552a, 0]  where  a  =  (sin  |)/125\/^.  The  overall  MSE  is 
0.4935,  again  slightly  better  than  scalar  quantization  (MSE  =  0.5073). 
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Abstract 

We  examine  how  the  performance  of  a  memoryless  vector 
quantizer  (VQ)  changes  as  a  function  of  its  training  set  size.  By 
relating  the  training  distortion  of  such  a  codebook  to  its  test 
(true)  distortion,  we  demonstrate  that  one  may  obtain  “good” 
codebooks  at  a  fraction  of  the  computational  cost  by  training  on 
a  small  random  subset  of  the  blocks  in  the  target  image. 

Background 

For  a  system  with  a  fixe<l  number  of  degrees  of  freedom,  one 
may  bound  the  difference  between  the  error  of  that  system  on 
an  arbitrary  distribution  (a  test  set)  and  its  performance  on  a 
subset  of  that  distribution  (the  training  set).  Roughly,  with  fixed 
confidence,  this  difference  is  bounded  by 

</  I 

(fe.sf  -  (rain)  <  f)(  log  —  j,  (1) 

m  m 

where  m  is  the  size  of  the  training  set  and  d  is  the  Vapnik- 
Chervonenkis  (VC)  dimension  of  the  system,  a  measure  of  its 
number  of  degress  of  freedom  (l|.  Empirically,  it  has  been  ob¬ 
served  that  for  some  learning  systems,  the  expected  value  of  this 
difference  varies  as  0{d/m)  By  bounding  the  difference  between 
test  and  training  errors,  one  ran  bound  the  difference  between 
the  test  error  and  the  “optimal”  error  the  test  error  if  the 
system  had  been  trained  on  an  infinite  amount  of  data. 


Figure  I:  Test  and  training  distortion  of  codehooks  trained  on 
small  subsets  of  bio  ks  drawn  at  random  from  a  target  image 

Application  to  Vector  Quantization  (VQ) 

We  have  extended  this  tlieory  to  relate  to  the  training  of 
vector  quantizer  rodebooks.  We  have  also  conducted  empiri¬ 
cal  studies  which  have  determined  the  effective  VC  dimension 
of  VQ  codebooks.  Our  results  quantitatively  show  that  a  V'Q 
codebook  trained  on  a  small  random  subset  of  vectors  from  a 
target  image  performs  almost  as  well  at  quantizing  that  image 
as  a  codebook  trained  on  the  entire  image,  hut  at  a  fraction  of 
the  computational  coat  (2|  (Fig.  I).  Some  empirical  results  are 
outlined  below. 

Given  an  image  Z  composed  of  ,\t  t-dimensional  blocks,  we 
wish  to  design  a  t-dimensional  codebook  with  A'  codewords.  W'e 
extract  m  t-dimensional  blocks  at  random  from  Z  (with  or  with¬ 
out  replacement),  and  use  this  training  set  Z""  as  input  to  the 
GLA  codebook  design  algorithm  [3|.  The  distortion  that  the  re¬ 
sulting  codebook  imposes  on  Z"  is  our  (rnininj  dtMnrIion. 


Figure  2:  (Test  —  train)  distortion  is  described  by  a  simple  func¬ 
tion  of  the  training  set  size. 

The  codebook  is  then  used  to  quantize  image  Z,  and  the  resulting 
test  distortion  is  measured.  As  predicted  by  theory,  the  test  and 
training  distortions  followed  a  simple  relationship  (Fig.  2): 

(test  —  train)  =  — [replacement] 

,  .  ,  M  -  m  a  ,  ,  , 

((es(  —  tram)  —  -tt - ; - -  no  replacement). 

M  —  i  m  +  p 

Parameter  o  is  the  learning  complexity  of  the  image  for  the 
given  codebook  size,  and  0  is  an  offset  factor,  which,  for  the  most 
part,  we  may  ignore.  We  found  that  o  varied  very  little  between 
natural  images,  but  depended  almost  primarily  on  N.  For  nor¬ 
malized  mean-squared  distortion  of  8-bit  grayscale  images,  we 
found  a  “typical”  learning  complexity  of 

a(N)  =  0.363^®” 

l>et  us  say  that  we  want  our  codebook’s  distortion  to  be 
within  3%  of  its  asymptotic  distortion.  If  V  =  512  vectors, 
o  as  .50.  Solving  (test  -  train)  =  0.03  as  o/m  for  rn  gives 
m  =  1672  blocks.  This  indicates  that  if  we  train  our  512-vector 
codebook  on  1672  blocks  drawn  at  random  from  an  image  (or  set 
of  images),  we  can  expect  that  the  codebook’s  performance  on 
that  entire  image  will  be  within  3%  of  the  performance  that  we 
would  get  if  we  were  to  train  the  codebook  on  the  entire  image 
(or  set  of  images),  regardless  of  how  large  the  image  (set)  is! 

These  results  are  for  a  particular  implementation  of  GLA, 
and  the  exact  dependence  of  o  on  M  will  vary  with  different  im¬ 
plementations  and  different  input  domains.  We  have  obtained 
slightly  different  results  using  simulated  annealing  to  design  code¬ 
books.  Still,  for  given  training  regimen,  it  is  straightforward  for 
users  to  determine  the  typical  learning  complexities  of  their  par¬ 
ticular  problems,  and  then  use  these  values  to  select  appropri¬ 
ately  small  training  set  sizes. 

[I]  V.  Vapnik,E.s(ima(ion  o/  dependencies  based  on  empirical 
data.  Springer- Verlag.  New  York.  1982. 

(3]  D.  A.  Cohn.  “Separating  formal  bounds  from  practical  per¬ 
formance  in  learning  systems,”  Ph.D.  dissertation,  Univ. 
Washington  Computer  Science  &  Eng..  1992. 

(S]  Y.  Linde,  A.  Buzo,  and  R.  M.  Gray  “An  algorithm  for 
vector  quantizer  design.”  in  IEEE  Transactions  on  Com- 
niantra(ion.s,  28:84  95,  1980. 
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A  group  code  C  is  a  subgroup  of  a  direct  product  sequence  space 
G‘,  where  G  is  a  group  and  /  €  Z  is  an  index  set.  In  other  words, 
a  codeword  c  6  C  is  a  sequence  c  =  {c*  e  G  :  fc  €  /}  of  elements  of 
G.  Though  G  may  in  general  be  nonabelian,  we  refer  to  the  group 
operation  of  G  as  a  “sum”  and  its  identity  element  as  0.  The  group 
property  of  C  thus  ensures  that  the  componentwise  sum  of  any  two 
codewords  is  another  codeword. 

For  the  purposes  of  this  abstract,  we  assume  that  /  =  Z,  so  that 
codewords  are  bi-infinite  sequences,  and  that  C  is  time-invariant,  so 
that  shifts  of  codewords  are  codewords.  These  two  assumptions  serve 
primarily  to  simplify  notation.  We  also  require  the  technical  condi¬ 
tion  that  C  be  a  closed  set  in  the  topology  of  pointwise  convergence. 

Let  C j  denote  the  set  of  sequences  of  C  that  are  zero  outside  the 
subset  J  C  Z.  So,  for  example,  G[„,„)  denotes  the  set  of  sequences 
that  are  possibly  nonzero  only  on  the  interval  [m,  n).  The  set  Cj  is 
a  normal  subgroup  of  C.  In  [1]  it  was  shown  that  any  group  code  C 
has  a  well-defined  state  group  S*  =  C/(C(.oo»C(k.oo))  »t  each  time 
lb  €  Z,  and  that  C  has  a  canonical  state/output  realization  whose 
state  space  at  each  time  k  is  Sk-  The  canonical  realization  identifies 
each  code  sequence  c  with  a  unique  state  sequence  ^(e),  so  that  one 
may  refer  unambiguously  to  the  state  <ri(c)  of  a  sequence  c  6  G  at 
each  time  k.  The  canonical  realization  is  minimal. 

It  was  further  shown  that  a  minimal  feedforward  encoder  for  G 
can  be  constructed  in  controller  canonical  form  from  elementary  con¬ 
stituents  of  the  state  group  called  “granules.”  The  controller 
canonical  form  describes  code  sequences  as  combinations  of  finite- 
length  generator  sequences. 

In  this  paper  we  give  a  dual  construction  based  on  a  state  observer 
that  recovers  the  state  of  a  code  sequence  c  €  G  at  time  k  from  a  coset 
decomposition  of  recent  outputs  Ck-m,  ■  ■ -iCih-i-  The  state  observer 
is  used  to  construct  a  syndrome  former  and  a  minimal  encoder  in 
observer  canonical  form  that  describes  code  sequences  in  terms  of 
“parity  checks”  that  must  be  satisfied. 

A  syndrome  map  for  a  group  code  G  is  a  function  f:  G^  -*  5 
whose  kernel  /“'(O)  C",  where  the  syndrome  set  5  =  /(G*)  is  a 

set  that  contains  the  element  0.  If  /  is  a  syndrome  map  for  G,  then 
a  sequence  c  g  G^  is  a  codeword  of  G  if  and  only  if  /(c)  =  0. 

A  syndrome  map  /:  G^  -*  G^//G  for  a  group  code  G  may  be 
constructed  by  setting  f(g)  =  gC  for  g  6  G*,  where  the  syndrome 
set  5  is  the  set  G^//G  of  left  cosets  of  G  in  G®  and  the  “identity” 
coset  G  of  G^//G  is  identified  as  the  element  0  of  5.  This  map  is  a 
homomorphism  if  G  is  a  normal  subgroup  of  G*;  then  the  syndrome 
set  is  the  quotient  group  G*/G.  More  generally,  there  exists  a  homo¬ 
morphism  /:  G^  —  S  with  kernel  G,  i.e.,  a  homomorphic  syndrome 
map,  if  and  only  if  G  is  a  normal  subgroup  of  G*. 

We  are  thus  motivated  to  investigate  group  codes  that  are  normal 
subgroups  of  their  parent  sequence  space,  which  we  call  “normal 
codes.”  All  abelian  codes  are  normid.  More  generally,  we  show  that 
G  it  normal  if  and  only  if  G  has  abelian  dynamics,  i.e.,  if  and  only 
if  its  stale  group  E*  =  G/{G(_,x,,*)G(n,oo))  is  abelian  at  each  time 
kg  Z. 


The  parallel  transition  subgroup  riseZ^jzAI  *  normal  rode 
must  include  the  commutator  subgroup  of  G,  so  G|k>|  must  include 
the  conmiutator  subgroup  of  At  =  {ct^c  6  G).  When  the  output 
sequence  space  is  nonabelian,  therefore,  there  exist  nontrivial  parallel 
transition  subgroups;  these  impose  upper  bounds  on  the  minimum 
distance  of  G. 

A  code  is  p-observahle  if,  given  any  two  code  sequences  C 

that  agree  on  a  length-/!  interval  [k,  k  -t-  ft),  the  sequence  c"  defined 
by 

a  if  t  <  k, 
c'i  if  i  >  k 

is  a  code  sequence  in  G.  In  other  words,  if  two  code  sequences  agree 
on  an  interval  of  width  at  least  p,  then  the  past  of  one  can  be  con¬ 
catenated  with  the  future  of  the  other.  The  least  such  /i  is  the 
observability  index  of  G.  A  0-observable  system  is  metroryless. 

A  syndrome  former  for  a  code  C  C  G*  is  an  input /state/output 
dynamical  system  with  input  space  G^  and  output  spsice  5*  whose 
output  sequence  is  0  if  and  only  if  the  input  sequence  is  in  G.  A 
syndrome  former  has  memory  m  if  the  output  s^  at  any  time  k  g  Z 
can  be  expressed  as  a  function  of  the  previous  m  and  the  current 
inputs,  C),. 

We  show  that  the  a  syndrome  former  for  a  code  with  observability 
index  p  must  have  memory  m  >  p.  We  also  show  that  a  syndrome 
former  must  contain  an  underlying  state-output  realisation  of  G;  the 
syndrome  former  is  minimal  if  and  only  if  this  underlying  realisation 
is  minimal.  As  group  codes  have  essentially  unique  minimal  resdisa- 
tions,  the  state  spaces  of  a  minimal  syndrome  former  are  essentially 
unique  and  correspond  to  the  state  spaces  Es  of  G.  A  minimal  svn- 
drome  former  must  therefore  track  the  state  sequence  of  G:  syndrome 
formers  are  inherently  state  observers. 

Given  a  /i-observable  group  code  G,  we  show  how  to  construct  a 
minimal  syndrome  former  for  G  with  memory  p.  The  construction 
is  based  on  a  decomposition  of  the  state  group  into  dual  granules 

At  each  time  k  the  state  group  Ek  has  a  coset  decomposition  into 
dual  granules 

S*  0  n 

>=0 

such  that  the  value  of  a  granule  depends  only  on  outputs 

• -jtk-i-  An  observer  constructed  from  thi.v  decorapositinn  is 
necessarily  feedforward  with  memory  p. 

From  tills  system  one  may  construct  a  minimal  syndrome  former, 
a  minimal  feedforward  inverter,  and  a  minimal  encoder  in  observer 
ceuionical  form. 
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An  Mometry  ^  of  n-dimentional  Euclidean  space  A"  is  a  bijection 
^  A'  A"  that  preserves  Euclidean  distance.  An  isometry  code 
C  is  a  subgroup  of  a  direct  product  G^,  where  6  is  a  group  of 
isometriet  of  A".  In  other  words,  a  codeword  e  e  C  is  a  sequence 
e  s  {ct  €  Cf  :  t  E  /}  of  isometries  of  A”.  A  codeword  is  therefore  an 
isometry  of  infinite-dimensional  Euclidean  space  (A")^.  Isometry 
codes  are  a  type  of  group  system,  and  can  be  analysed  using  the 
theory  developed  in  [2|. 

Many  useful  trellis  codes  may  be  described  u  the  orbit  Cm  of  a 
sequence  •  E  (A")^  under  an  isometry  code  C.  Certain  aspects  of 
Isometry  codes  are  studied,  under  different  terminology,  by  Forney  [1] 
and  Locligcr  (3|.  IVellis  codes  generated  by  isometry  codes  are  ge¬ 
ometrically  uniform;  thus,  when  used  for  data  transmission  over  an 
additive  white  Gaussian  noise  channel  with  maximum  likelihood  de¬ 
coding,  the  probability  of  error  is  independent  of  the  transmitted 
codeword.  This  property  greatly  simplifies  performance  analysis. 

We  develop  a  method  for  realising  trellis  codes  as  isometry  codes. 
First,  the  states  of  a  minimal  trellis  for  the  code  are  assigned  a  group 
structure  that  is  "consistent’*  with  the  trellis  branches  and  labels. 
The  Euclidean  subset  label  on  each  trellis  branch  is  then  replaced 
with  a  coset  of  isometries  that  generates  the  Euclidean  subset  from 
a  distinguished  initial  point  t  E  A".  The  resulting  trellis  defines  <ui 
Isometry  code  that,  when  applied  to  the  Initial  sequence  m  =  {zs  = 
s  ;  k  E  Z},  yields  the  original  trellis  code. 

In  more  detail,  let  C  be  an  isometry  code,  and  let  Cm  be  a  cor¬ 
responding  geometricsdly  uniform  trellis  code.  We  assume  that  the 
signal  set  5  C  A"  of  the  trellis  code  is  partitioned  into  n  cells  (sub¬ 
sets)  So,...,  5n- 1 ,  and  that  C  and  Cm  have  the  same  minimal  trellis 
diagram.  Let  E  be  the  state  group  of  C,  and  let  the  branch  group  B 
be  the  set  of  all  stale  pairs  (<7|,0])  E  E  x  S  that  are  connected  by  a 
trellis  branch.  Assume  without  loss  of  generality  that  So  is  the  cell 
assigned  to  the  branch  from  the  identity  state  to  the  identity  state, 
and  define  B,  to  be  the  set  of  branches  labeled  with  the  partition  cell 
5i. 

We  show  that  the  Euclidean  and  isometry  labelings  of  the  mini¬ 
mal  trellis  are  related  as  follows;  first.  Bo  is  a  subgroup  of  the  branch 
group  B,  and  the  set  Bi  of  branches  labeled  with  partition  cell  Si  is  a 
left  coset  of  Bo  in  B.  Second,  for  any  branch  ft  E  A,  if  bBi  =  Bj  then 
the  coset  of  isometries  assigned  to  branch  b  sends  cell  5;  to  5^.  These 
two  conditions  completely  characterise  the  possible  state  groups  and 
isometry  labelings  consistent  with  a  particular  Euclidean  trellis. 

The  first  of  these  conditions  is  satisfied  by  any  trellis  code  de¬ 
scribed  as  the  combination  of  a  binary  linear  convolutional  code  and 
a  mapping  from  coded  bits  to  partition  ceUs.  The  second  condition, 
hosrever,  holds  only  if  mapping  from  bits  to  cells  respects  the 
symmetries  of  the  partition. 

For  example,  we  show  that  a  particular  sixteen-state  Wei  code, 
defined  over  an  8-way  partition  of  the  integer  lattice  Z*,  has  a  rep¬ 
resentation  as  an  isometry  code.  The  code  is  defined  by  a  rate-2/3 
binary  linear  convcdntional  code  followed  by  an  unusual  mapping 
from  coded  bits  to  partition  ceQs.  We  show  that  this  mapping  sat¬ 
isfies  the  conditions  presented  above  by  finding  a  group  of  8  cosets 


of  isometries  of  A^  that  generates  the  partition  and  is  isomorphic  to 
(Zj)*- 

As  a  second  example,  we  show  that  Wei’s  nonlinear  eight-state 
trellis  code,  specified  in  the  CCITT  V.32  standard,  also  hsM  a  repre¬ 
sentation  as  an  isometry  code.  Ignoring  the  edge  effects  of  the  finite 
signal  set,  the  V.32  code  is  therefore  geometrically  uniform.  The 
signal  set  for  the  V.32  trellis  code  is  a  translate  of  the  8-way  lattice 
partition  RZ‘/4Z’.  Isometries  of  A*  are  denoted  as  follows:  is 

translation  by  (a,  6),  rs  is  rotation  by  0  degrees  clockwise  about  the 
origin,  oi  is  reflection  across  the  line  {(0,p) :  y  E  A},  or  is  reflection 
across  the  line  {(-y,|»)  :  y  E  A},  and  1  is  the  identity  isometry. 
The  V.32  trellis  code  is  then  described  as  the  orbit  of  the  sequence 
m  =  {. . . ,  (0, 1),  (0, 1), . . .}  under  the  isometry  code  C  generated  by 
the  time  shifts  of  the  sequences 

9i  =  (-•-.lil.f(jj)if(j,»)»‘iso,  1,1.---). 

Bj  =  (■  -  •»  1,  l,f(j,o)<®J)f(o,r)®ii  If  •  • -)- 

This  description  may  be  converted  into  a  trellis  diagram  by  re¬ 
placing  isometries  by  their  permutation  actions  on  the  cells  of  the 
partition  R.Z’/4Z^.  This  converts  the  isometry  code  C,  defined  over 
an  infinite  isometry  alphabet,  into  a  code  C'  defined  over  a  finite 
alphabet  of  permutations.  The  methods  developed  in  [2]  may  then 
be  applied. 

The  state  group  of  the  V.32  isometry  code  is  nonabelian,  and  is 
isomorphic  to  the  dihedral  group  D4.  Eiach  of  the  32  trellis  branches 
is  assigned  a  distinct  coset  of  isometries,  yet  there  are  only  8  partition 
cells.  The  map  from  isometries  to  partition  cells  is  therefore  many- 
to-one — a  property  peculiar  to  isometry  codes  with  nonabelian  state 
groups.  The  isometry  code  contains  the  constant  rotation  sequences 
(. . . , Too,  Cm,  Cm,  -  ■  •),  which  is  a  sufficient  condition  for  90-degree  ro¬ 
tational  invariance. 

As  a  final  example,  we  show  that  the  (16,8,6)  binary  nonlinear 
Nordstrom- Robinson  code  may  be  represented  as  a  block  isometry 
code  over  a  group  of  rotations  of  A*,  or  equivalently  as  a  ring  code 
over  (Zs)*.  The  binary  code  is  embedded  in  A**  by  interpreting 
codewords  as  vertices  of  a  16-cube. 
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Summary  —  Any  discrete  subset  of  an 
where  N  is  any  positive  integer,  is  called  a 
signal  set.  A  signal  set  may  be  finite  or  infinite.  A 
bijective  map  from  to  itself,  which  preserves 
Euclidean  distance,  is  called  an  isometry  of 
The  set  of  all  isometries  of  which  leaves  a 
signal  set  S  C  R^  invariant  forms  a  group  with 
respect  to  the  composition,  called  the  symme¬ 
try  group  of  S  and  denoted  by  T(S).  In  1991 
G.D. Forney  [1]  introduced  geometrically  uniform 
signal  sets. 

Definition  1:  A  signal  set  5  is  said  to  be  geo¬ 
metrically  uniform  if  r(5)  acts  transitively  on  5. 
In  the  same  year  H.-A.  Loeliger  [3]  introduced 
signal  sets  matched  to  groups. 

Definition  2:  A  signal  set  5  is  said  to  be 
matched  to  a  group  G  if  there  is  a  surjective  map 
fi  from  G  to  5  such  that,  for  all  g  and  g'  in  G 

(1) 

where  d  denotes  the  Euclidean  distance  and  c  de¬ 
notes  the  identity  element  of  G. 

The  purpose  of  this  note  is  to  show  that  these 
two  concepts  coincide.  The  case  when  the  signal 
set  is  finite  was  proved  by  Loeliger  [3]. 

A  bijective  map  /  from  a  signal  set  S  to  itself  is 
called  an  isometry  of  S,  if  for  all  s  and  s'  in  5, 

d{f{s)J{s'))  =  d{s,s').  (2) 


Corollary  2:  Let  5  be  a  signal  set  in  R^  and 
assume  that  Span5  =  R^.  Then  any  symmetry 
of  S  can  be  extended  uniquely  to  an  isometry  of 
R'^. 

Theorem  3:  Let  S  be  a  signal  set  in  R^ .  Then 
5  is  a  geometricadly  uniform  signal  set  if  and  only 
if  S  can  be  matched  to  a  group. 

Corollary  4:  Let  5  be  a  signal  set  matched  to 
a  group  G.  Assume  that  S  span  R^ .  Then  G  is 
homomorphic  to  a  transitive  subgroup  of  r(5). 
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Lemma  1:  Any  symmetry  of  a  signal  set  5  C 
R^  can  be  extended  to  an  isometry  of  R^. 
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All  known  existence  proofs  for  capacity- 
achieving  codes  whose  algebraic  structure  is 
at  least  a  group  rely  on  averaging  arguments 
for  linear  codes  over  finite  fields  —  except, 
seemingly,  de  Buda’s  proof  [1]  for  lattice 
codes.  De  Buda’s  starting  point  is,  instead, 
the  Minkowski-Hlawka  theorem  from  geomet¬ 
ric  number  theory.  We  remove  this  anomaly 
by  showing  that  the  standard  proof  of  that 
theorem  has  a  natural  interpretation  as  an  av¬ 
eraging  argument  for  linar  codes  in  the  follow¬ 
ing  setup. 

We  consider  the  discrete-time  Gaussian 
channel  with  p-Ievel  amplitude  modulation,  p 
prime,  and  hard-decision  ‘mod-p  demodula¬ 
tion’,  i.e.,  the  received  signal  is  reduced  mod-p 
and  quantized  to  the  nearest  integer.  (The  ap¬ 
proach  can  be  extended  to  soft-decision,  how¬ 
ever.)  We  thus  have  created  a  channel  with 
mod-p  additive  noise,  for  which  linear  codes 
over  GF(p)  we  the  natural  choice.  Moreover, 
existence  results  for  linear  codes  imply  corre¬ 
sponding  results  for  the  associated  mod-p  lat¬ 
tices,  as  is  demonstrated  by  some  examples. 
In  particular,  the  Minkowski-Hlawka  theorem 
is  shown  to  follow  from  a  Gilbert- Varshamov- 
type  argument  for  linear  codes. 

Having  thus  seen  that  the  first  step  of  de 
Buda’s  proof  can  be  derived  with  averaging 
arguments  for  linear  codes,  it  is  natural  to  ask 
whether  an  alternative,  more  direct,  existence 
proof  for  good  lattice  codes  can  be  based  on 
such  arguments,  which  we  show  is  indeed  pos¬ 
sible. 

It  is  shown  that  spherically  shaped  cosets 
of  linear  codes  over  GF(p)  achieve  the  capac¬ 
ity  of  almost  every  channel  with  p  inputs, 
each  associated  with  a  certain  cost,  under 
a  constraint  on  the  total  cost  of  each  code¬ 


word.  Application  of  this  result  to  the  Gaus¬ 
sian  channel  with  p-level  amplitude  modula¬ 
tion  implies,  in  the  limit  for  p  — ♦  oo,  the  cod¬ 
ing  theorem  for  the  corresponding  mod-p  lat¬ 
tice  codes.  (Such  a  limit  p  — »  oo  is  also  part  of 
the  standard  proof  of  the  Minkowski-Hlawka 
theorem  and  thus  implicitly  contained  in  de 
Buda’s  proof.) 

An  unsatisfactory  point  of  our  proof  —  as 
well  as  of  de  Buda’s,  even  in  its  corrected  ver¬ 
sion  [2]  —  is  that  the  upper  bound  on  error 
probability  holds  only  for  the  average  over  all 
codewords  of  the  code;  if  the  code  is  ‘cleaned’ 
be  deleting  weak  codewords,  then  its  alge¬ 
braic  structure  is  destroyed.  We  thus  conclude 
by  emphasizing  that  there  is  no  proof  known 
that  reasonably  shaped  lattice  codes  unthout 
weak  codewords  can  achieve  the  capacity  of 
the  Gaussian  channel  at  any  finite  SNR. 
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Abstract 

ErTOT'Correcting  channel  codes  designed  over  the  real-field  are 
advantageons  over  finite-field  codes  in  certain  cases.  The  Real-field 
codes  are  sensitive  to  small  deviations  from  the  ideal  continuous  val¬ 
ues  as  well  as  to  large  errors  which  are  desired  to  be  removed.  The 
former  caused  by  things  like  finite  precision,  receiver  noise,  or  chan¬ 
nel  noise.  A  special  chamnel  model  is  used  which  includes  both  back¬ 
ground  and  impulsive  noises.  Real-field  codes  can  "correct”  up  to 
{N  —  K  —  1)  impulsive  errors  per  word  if  the  ratio  of  noise  powers 
is  large  enough. 


Summary 


The  Background  and  Impulsive  Noise  (BIN)  channel  has  additive 
white  Gaussian  background  noise  which  affects  every  vector  compo¬ 
nent  while  another  independent  Gaussian  noise  is  switched-on  with 
a  probabilistic  switch.  The  BIN  channel  has  a  larger  capacity  for 
a  continuous  Gaussian  input  than  for  a  finite-alphabet  input,  thus 
motivating  the  use  of  Real-codes  on  this  channel.  Large  alphabet 
inputs  approach  the  capacity  of  the  continuous  input  channel.  Block 
channel  coding  is  accomplished  by  a  linear  transformation  on  the  in¬ 
formation  symbols,  £  s  G]t  ,  where  Euclidian  distance  is  preserved 
by  requiring  G^G  =  /. 

The  optimal  minimum  mean-squared-error  estimator  (MMSEE) 
is  the  best  decoding  algorithm  when  the  decoded  MSE  is  to  be  min¬ 
imized.  It  has  the  form: 


u  =  ENv  =  ^  = 


(1) 


uW  =  E[«|»  =  £,s  =  s,.] 


(2) 


a  is  the  input  vector;  £  is  the  output  vector; 

1  is  the  impulsive  error  location  pattern  (ie.  3=  (0,1, 0,1,0)  ) 

Unfortunately,  it  requires  an  exhaustive  search  over  all  2^  possible 
impulsive-error  location  patterns,  ij.  A  lower  bound  shows  that  the 
best  decoding  MSE  per  component  cannot  be  much  smaller  that  the 
white  Gaussian  background  noise  variance.  The  ratio  of 

impulsive-error  variance  to  background  noise  variance  is  critical  to 
the  performance  of  the  decoding.  Larger  ratios  of  (0^/0’)  permit  a 
decoding  MSE  which  is  closer  to  the  background  noise  level. 

Using  a  parity-check  matrix,  which  is  orthogonal  to  the  gen¬ 
erator  matrix  G,  eliminates  the  need  for  knowledge  of  the  source, 
but  at  the  expense  of  increased  MSE.  The  optimal  MMSEEz  for  the 
syndrome  subspace  3,  is  difficult  to  derive  analytically.  The  estima¬ 
tor  is  developed  by  using  a  form  similar  to  the  MMSEE,  but  which 
uses  an  indirect  estimate  of  the  source  word. 


Jl  = 


Again,  a  lower  bound  can  be  derived  which  shows  that  no  decod¬ 
ing  which  uses  this  syndrome  method  can  have  a  MSE  less  than  the 
background  noise  variance.  It  also  predicts  the  MSE  for  each  possi¬ 
ble  error-pattern.  The  MSE  increases  with  the  number  of  estimated 
errors.  The  MSE  performance  of  generator-parity  check  matrix  pairs 
can  be  quantified.  This  decoding  algorithm  also  requires  an  exhaus¬ 
tive  search  of  the  2^  possible  impulsive-error  location  patterns. 


Some  applications  require  that  the  important  performance  crite¬ 
rion  be  the  probability  that  impulsive  errors  be  correctly  detected. 
We  define  an  impulsive  error  as  any  channel  noise  with  amplitude 
greater  than  some  multiple  of  the  background  noise  standard  devia¬ 
tion,  0o„.  The  optimal  estimators  for  the  error-location  patterns  3^ 
are  the  MAP  deusion  rules  in  the  codeword  space  and  the  syndrome 
subspace  which  are  given  by: 

max,.  [Pi|»=£(^)]  =  max,^  b»|£=.,  (^)-PrU  =  Sj]]  (4) 

max,.  [P£l£=c(d>)]  =  max,^  [Pilf-j,  (i)-Pr[s  =  j;,]]  (5) 

Several  interesting  trends  can  be  observed  in  the  performance  of 
the  real-codes  when  constant  energy  codes  are  compared  at  differ¬ 
ent  rates.  Larger  ratios  of  impulse-variance  to  background- 

variance  improves  the  error  location  estimation.  The  syndrome  MAP^ 
estimator  can  "correct”  up  to  (Y  -  /f  - 1)  errors  in  a  word  if  the  ra¬ 
tio  (ffm/oj)  is  large  enough.  Short  codelengths  can  "correct”  some 
of  the  (IV  -  K)  patterns.  The  idea  of  "correct"  meains  to  exactly 
estimate  impulsive  errors  when  no  background  noise  is  present,  or 
with  a  high  probability  of  being  within  some  Ba^  of  the  impulsive 
amplitudes  when  background  noise  is  present.  The  optimal  MAP 
estimator  can  "correct”  more  than  the  MAP,,  up  to  N  impulsive 
errors  in  a  word,  but  this  amount  decreases  as  the  input  power  in¬ 
creases  until  an  infinite  input  power  equates  it  to  the  performance  of 
the  MAP,  estimator.  It  should  be  noted  that  long  codelengths  can 
make  the  percentage  of  words  with  (JV  -  fiT  -  1)  or  fewer  impulsive 
enors  as  close  to  100%  as  desired. 


In  many  applications  only  very  small  decoding  errors  are  consid¬ 
ered  correct,  while  any  larger  errors  are  aU  considered  just  as  bad. 
Instead  of  using  the  MMSEE  decoders,  an  alternative  is  to  use  the 
Largest  Density  (LD  or  LD,)  decoder  which  first  selects  the  error- 
bcation  pattern  3j  which  corresponds  to  the  largest  conditional  a 
posteriori  density  (MAP  or  MAP,).  Then  it  uses  the  corresponding 
conditional  estimate  or  to  estimate  the  source  word.  The 
LD  decoders  will  have  a  higher  MSE,  but  they  will  have  a  higher 
percentage  of  decoded  words  with  very  smaR  errors.  If  the  error- 
bcation  pattern  estimate  is  correct,  then  the  error  is  essentially  the 
background  nmse.  If  the  pattern  estimate  is  wrong,  then  the  error 
is  very  large.  Thus  the  decoding  error  can  be  made  to  be  near  the 
background  variance  for  words  with  {N  -  K  -\)ot  fewer  impulsive 
errors  if  the  ratio  {Onlo\)  is  large  enough. 
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On  Minimality  Conditions  for  Linear  Systems 
and  Convolutional  Codes 


Hans- Andrea  Loeliger 
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S-58183  Linkoping,  Sweden 

The  recent  generalizations  of  convolutional 
codes  to  rings  [1],  [2]  and  groups  [3],  [4]  have 
redirected  some  attention  to  the  old  topic  of 
characterizing  minimal  and  catastrophic  en¬ 
coders  and  how  these  notions  are  related  to 
the  minimality  concept  for  linear  systems.  We 
euldress  these  questions  in  a  universal-algebra 
framework  that  treats  codes  over  groups,  rings, 
and  fields  in  a  unified  way. 

Any  standard  description  of  a  convolutional 
encoder  leads  to  a  state-transition  diagram 
(or  ‘transition  graph’  [4])  whose  branches  B 
form  a  subspace  of  5  x  y  x  5,  where  5  is  the 
state  space  and  Y  is  the  code  symbol  (encoder 
output)  by  which  the  branch  is  labeled.  (For 
convolutional  codes  over  groups,  the  branches 
form  a  sub^up  of  the  direct  product  SxY  x 
S,  where  5  and  Y  are  groups.) 

A  branch  («,  0,  s')  £  Bis  called  left-neutral 
(right-neutral)  if  s  =  0  {s'  =  0);  it  is  two-sided 
neutral  if  it  is  part  of  a  zero  loop.  It  will  be 
assumed  that  the  state  space  (state  group)  5 
satisfies  the  so  called  descending-chain  con¬ 
dition,  which  is  a  generalization  of  finite  di¬ 
mensionality  to  modules  and  groups.  (This 
condition  is  always  satisfied  for  finite  5). 

Theorem:  Each  of  the  following  conditions  is 
equivalent  to  the  minimality  of  the  transition 
graph: 

1.  No  state  other  than  the  zero  state  is  the 
ending  or  starting  state  of  a  semi-infinite 
path  all  of  whose  labels  are  zero. 

2.  The  set  of  left-neutral  branches,  the  set 
of  right-neutral  branches,  and  the  set 
of  two-sided  neutral  branches  all  consist 
only  of  the  zero-branch. 
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3.  There  exists  a  nonnegative  integer  L  such 
that  any  length-L  path  segment  is  uniquely 
determined  by  its  label  sequence. 

Most  other  published  minimality  conditions 
for  convolutional  encoders  can  easily  be  de¬ 
rived  from  these  conditions. 

If  the  branches  are  labeled  with  input-out- 
put  pairs  rather  than  with  output  symbols 
only,  then  the  above  conditions  characterize 
the  minimality  with  respect  to  the  transfer 
function,  which  is  the  traditional  viewpoint  in 
system  theory  [5],  and  the  standard  minimal¬ 
ity  condition  for  realizations  of  linear  trans¬ 
fer  functions  —  minimal  O  controllable  and 
observable  —  is  an  easy  consequence  of  the 
theorem. 
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In  many  speech  and  image  coding  schemes,  some  of  the  coded  bits 
are  extremely  sensitive  to  channel  errors  while  some  others  exhibit  very 
little  sensitivity.  In  order  to  make  the  best  use  of  channel  redundancy, 
unequal  error  protection  (UEP)  codes  are  needed.  In  a  bandlimited 
environment,  such  coding  and  the  modulation  should  be  integrated. 
In  this  work,  we  propose  two  combined  UEP  coding  and  modulation 
schemes. 

The  first  method  multiplexes  different  coded  signal  constellations, 
with  each  coded  constellation  providing  a  different  level  of  error  pro¬ 
tection.  The  novdty  here  is  that  a  codeword  specifies  the  multiplexing 
rule  and  the  choice  of  the  codeword  from  a  fixed  codebook  is  used  to 
convey  additional  important  information.  The  decoder  determines  the 
multiplexing  rule  before  decoding  the  rest  of  the  data. 

The  second  method  is  based  on  partitioning  a  signal  constellation 
into  disjoint  subsets,  where  the  most  important  data  sequence  is  en¬ 


coded,  using  most  of  the  available  redundancy,  to  specify  a  sequence 
of  subsets.  The  partitioning  and  code  construction  is  done  to  max¬ 
imize  the  minimum  Euclidean  distance  between  two  different  valid 
subset  sequences.  This  leads  to  novel  ways  of  partitioning  the  signal 
consteUations  into  subsets.  Finally,  the  less  important  data  selects  a 
sequence  of  signal  points  to  be  transmitted  from  the  subsets.  A  side 
benefit  of  the  proposed  set  partitioning  procedure  is  a  reduction  in  the 
number  of  nearest  neighbors,  sometimes  even  over  the  encoded  signal 
constellation. 

Many  of  the  codes  we  have  designed  provided  virtually  error  free 
transmission  (greater  than  6  dB  coding  gain)  for  some  fraction(for  ex¬ 
ample,  25%)  of  the  data  while  providing  a  coding  gain  of  1  to  2  dB 
for  the  remaining  data  with  respect  to  uncoded  transmission.  The  two 
methods  can  also  be  combined  to  realize  new  coded  signal  constella¬ 
tions  for  unequal  error  protection. 
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SUMMARY 

Unequal  error  protection  (UEP)  codes  [1]  find 
applications  in  broadcast  channels,  as  well  as  in 
some  digital  communication  systems,  where 
messages  have  different  degrees  of  importance.  In 
this  paper,  we  propose  to  use  binary  linear  UEP 
(LUEP)  codes,  in  combination  with  a  QPSK  signal 
set  and  Gray  mapping,  to  obtain  new  efficient  block 
QPSK  modulation  codes  with  unequal  squared 
Euclidean  distances.  We  present  several  examples 
of  QPSK  block  modulation  codes  that  have  the  same 
minimum  squared  Euclidean  distance  (MSED)  as  the 
best  QPSK  block  modulation  codes  of  the  same 
length  and  rate.  In  the  proposed  new  coristructions 
of  QPSK  block  modulation  codes,  even-length  binary 
LUEP  codes  are  used.  It  is  shown  that  good  LUEP 
QPSK  block  modulation  codes  are  obtained  by 
combining  shorter  •  simpler  to  encode  and  decode  • 
binary  linear  codes  using  the  well  known  lulu-t-vl- 
construction  or  the  so-called  construction  X.  Both 
constructions  have  the  advantage  of  yielding 
optimal  or  near  optimal  binary  LUEP  codes  of  short 
to  moderate  lengths,  using  very  simple  constituent 
codes,  and  may  be  used  as  component  codes  in  the 
proposed  constructions  of  QPSK  modulation  codes. 
In  addition,  LUEP  codes  lend  themselves  quite 
naturally  multi-stage  decodings  [4],  using  the 
decodings  of  component  codes.  In  this  paper,  we 
present  a  new  suboptimal  two-stage  soft-decision 
decoding  of  binary  LUEP  codes  and  apply  It  to  the 
proposed  constructions  of  LUEP  QPSK  block 
modulation  codes. 


Constructions  via  Gray  mapping 

In  a  QPSK  signal  constellation  with  Gray  mapping 
between  labels  and  signal  points,  the  squared 
Euclidean  distance  between  signal  points  is  twice 
the  Hamming  distance  between  their  corresponding 
labels.  We  say  that  this  QPSK  signal  constellation 
forms  a  second-order  Hamming  space.  Qur  proposed 
new  construction  consists  of  a  Gray  mapping 
between  two-bit  blocks  and  signal  points  in  a  QPSK 
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signal  set,  together  with  (2n,k)  binary  LUEP  codes, 
with  separation  vector »  =  (si,  S2).  to  obtain  (n.k) 
LUEP  QPSK  modulation  codes  which  have  squared 
Euclidean  separations  =  (2ai,  2S2).  Some  of  the 
resulting  LUEP  QPSK  block  modulation  codes  have 
the  same  MSED  as  that  of  optimal  QPSK  block 
modulation  codes  of  the  same  rate  and  length  [2-3]. 
These  LUEP  QPSK  modulation  codes  offer,  in 
addition,  a  larger  MSED  between  code  sequences 
associated  with  most  important  message  bits,as 
shown  in  Table  1,  where  *  indicates  LUEP  QPSK 
modulation  codes  based  on  the  lulu+vl-construction. 
Gi  and  G2  in  Table  1  are  asymptotic  coding  gains 
corresponding  the  components  of  the  squared 
Euclidean  separation,  for  the  most  and  least 
significant  message  parts,  respectively.  R  denotes 
the  code  rate  in  bits  per  dimension.  It  should  be 
noted  that  all  the  optimal  QPSK  modulation  codes 
found  by  Sayegh  [2-3],  of  lengths  5  to  10,  can  be 
obtained  based  on  the  lulu-fvl-construction  and  Gray 
mapped  QPSK  signal  sets.  All  these  codes  are  in 
fact  LUEP  QPSK  modulation  codes,  and  this  appears 
to  be  the  first  time  that  this  has  been  pointed  out. 

Table  1:  Some  LUEP  QPSK  block  modulation  codes 
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.•\  new  class  of  random  sampling  algorithms  is  presented  for 
the  solution  of  estimation  problems  over  hypothesis  spares  S 
which  are  countable  unions  of  Kuclidean  spares  of  varying  di¬ 
mension'.  £  —  with  nitodel  k  of  diiTM*nsion  n^.  The  es¬ 

timation  problem  is  to  choose  parameters  in  £  given  some  data. 
Tiie  existence  of  a  distribution  p  on  the  parameter  space  £  is 
a.ssumed  relating  the  parameters  to  the  data,  with  p  taken  as  a 
convex  combination  of  p^-s  each  a  distribution  on  subspace  R**. 
The  Bayesian  conditional  mean  estimates  of  the  parameters  are 
generated  by  constructing  a  Markov  process  sampling  p. 

The  .Markov  process  .Y(f)  is  said  to  satisfy  jump-diffu.*ion 
dynamics  through  £  in  the  sense  that  (i)  on  random  exponential 
liiiM's  the  process  jumps  from  one  of  the  countably  infinite  set 
of  spaces  in  R"*,  k  =  1, 2.  ...  to  another,  and  (ii)  between  jumps 
il  satisfies  stochastic  differential  equations  over  the  respective 
spaces  We  have  proven  [1.  2|  that  as  long  as  the  diffu.sions 
have  drifts  which  make  the  p»  measures  invariant  within  each 
subspace,  and  that  the  distribution  p  on  £  is  invariant  for  the 
jump  process,  then  p  is  the  invariant  measure  of  the  process 
This  coupled  w  ith  the  assumptions  that  it  is  possible  to  get  from 
one  space  to  another  with  a  finite  number  of  jumps  allows  proof 
of  Harris  reeciirence  and  uniqueness  of  the  invariant  measure. 
From  this  it  follows  that  ergodic  averages  generated  from  the 
prosess  converge  to  their  expectations,  and  that  the  transition 
distribution  of  the  process  converges  in  variational  norm  to  the 
invariant  measure 

These  results  arc  ».i,nmarized  via  the  following  two  Theorems 
taken  from  [l]  VVe  a.ssume  that  each  of  the  distributions  p^ 
on  R"*  have  densities  with  respect  to  n*  dimensional  l.ebesgue 
measure  of  the  Gibb's  form  ^  •  i  — 


Theorem  1  Lef  the  jump  Jiffunion  proceiu  .Y(<)  have  the  prop- 
ertie.i  that 

ta)  the  diffusion  X(t)  within  any  subspace  R'*  satisfies  the 
stochastic  differential  equation 

d.Y(/)  =  -iv£:*(.V(<))df -t-<fH;.(G  (I) 

n  ilh  V(  )  and  f  R"*  the  state,  gradient  and  .standard 

rector  Brownian  motion,  respectively,  with  the  gradient  Vf  )  sat¬ 
isfying  Lipschit:  continuity,  and 

lb)  the  jump  intensities  q{j.dy).q(i)  defined  in  the  .standard 
way 


qlx,  dy)  =  lim 


Pr{.Y(<-fr)€dy|.V(t)  ^r)  -  U,(j) 
1 


12) 


and  q(x)  =  fc\rq(x.dy)  both  bounded  continuous  functions  .sat¬ 
isfying 

q{T)pldj)  =  J^q{y.di]ii{dy)  (.1) 

Then  fi  IS  an  invariant  measure  of  \{t). 


Theorem  3  Let  .Y(f)  be  the  .Markov  process  satisfying  Theorem 
I.  along  with  the  assumption  that  the  Euclidean  spaces  are  con¬ 
nected  under  the  ^anips.  i  r  dk.k'.  3jlk.k')  <  x  sequence  of 
jumps  carryinj  the  process  from  R"*  to  R**' . 

Then  p  IS  the  unique  invanant  measure  of  the  jump-diffusion 
process  .Y(().  and  the  associated  chain  .Y(iA).A  >  0  conicryrs 
III  total  variation  norm  ||  ||  (o  p  the  invanant  measuer 

forallr€T.  lim  |lP.'{.Y(iA)  €  i.Y(O)  =  j) -p(  )|t  =  0  (4) 

•  —  ■» 

We  emphasize  that  the  aforementioned  results  have  been  gen¬ 
eralized  to  unions  of  manifolds  such  as  the  nt-dimensiona]  Torus 
[j]  The  motivation  for  introducing  jump-diffusions  arises  in  ob- 
jiTt  recognition  (l .  3.  4]  in  which  the  different  continuous  and  dis¬ 
crete  components  of  the  discovery  of  both  the  shape  and  number 
of  objects  in  a  scene  are  accommodated.  Given  a  fixed  number 
of  objects,  call  it  k.  the  problem  is  to  reshape  via  group  trans 
lormations  such  as  scale,  rotation  and  translation-  the  k  objects 
to  ht  the  acquired  data.  For  this  the  model  k  consisting  of 
paraiTteters  is  fixed,  with  the  hytvothrsis  generation  a  continuous 
diffusion  through  scale-rotation-translation  parameter  space  R** 
following  Langevin's  stochastic  differential  equation  The  sec¬ 
ond  part  of  the  sampling  proce  -.  .he  jump  process  hypothesizes 
new  objects  and  removes  objects,  with  a  jump  corresponding  to 
a  transition  from  one  continuum  (model-order)  to  another. 


References 

[Ij  l'  Grenander  and  M.  I.  Miller.  Representations  of  knowledge 
in  complex  systems  Journal  of  the  Royal  Statistical  Society. 
in  review  February  1992. 

j2j  Y.  Amit,  I'.  Grenander.  and  M.i.  Miller.  Ergodic  properties 
of  jump-diffusion  processes.  .Annals  of  Applied  Probability, 
submitted  December  1992. 

[3j  M  l  .Miller.  D  .Maifitt,  J.  Shrauner.  B.  Koysam,  and 
r.  Grenander.  Automate<l  segmentation  cf  biological  shapes 
in  electron  microscopic  autoradiography.  Proceedings  of  the 
Twenty-Fifth  .dnnaof  Conference  on  Information  .’sciences 
and  Systems,  pages  637-642.  1991. 

|4]  A.  Srivastava.  M.I.  Miller,  and  T.  Grenander.  Jump-diffusion 
processes  for  object  tracking  and  direction  finding.  In  Pro¬ 
ceedings  of  the  29th  4nnaa/  .AUerton  Conference  on  Commu¬ 
nication.  Control  and  Computing,  pages  .^63  .^70.  I'rbana, 
Champaign.  1991  I'niversity  of  Illinois. 


•S.ipporte.l  hy  \SK  ['VIA  ECF,  SSi'iSIS.  ARO  DAAL  (M  sfi-K  i)1 10  ARO  P  29349  MA  SDI  ONR  59922 


ms 


The  Optimal  Error  Exponent  for  Markov  Order  Estimation 
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1.  Introduction 

A  wide  variety  of  approaciies  (iMJ-S)  have  been  developed 
over  the  years  to  estimate  the  order  or  a  Markov  chain.  How¬ 
ever,  as  stated  in  Merhav  et  at  [3),  only  recently  has  attention 
been  focused  on  estimators  with  optimhlity  properties  beyond 
consistency.  Our  results  are  in  the  spirit  of  |3)  and  ptovide, 
under  more  general  conditions,  a  complete  characterization  of 
the  error  exponents  and  consistency  properties  of  a  class  of 
order  estimators. 

Let  {.Y«,n  >  1)  be  a  stochastic  process  with  values  in 
X  :=  { 1, . . . ,  r}  and  let  P  be  the  probability  measure  on  /V* 
induced  by  {-Y, }.  The  measure  P  is  Markov  of  ofd«  r  k  iff: 

=  f’(.rnkllli)  for  n  >  k,  where  k  is  the  small¬ 
est  constant  for  which  the  equality  above  holds.  Let  Vt  be 
the  set  of  all  stationary  ergodic  Markov  measiiies  on  A'®’  of 
order  k.  We  observe  the  process  j-Y»}  of  unknown  measure 
^  €  U*li  where  ko  is  a  known  constant  and  wish  to  esti¬ 
mate  its  order.  We  focus  on  estimators  which  satisfy  a  general¬ 
ized  Neyman- Pestfson  criterion  of  optimality.  Specifically,  the 
optimal  order  estimator  minimizes  the  probability  of  underes¬ 
timation  among  all  estimators  whose  probability  of  overestima¬ 
tion  lies  below  a  prespecified  level.  Our  main  result  identifies 
the  best  exponent  of  decay  of  the  probability  of  underestima¬ 
tion.  We  further  construct  an  estimator  which  achieves  the 
best  exponent. 

3.  Preliminaries 

Given  sequence  x"  in  X'',n  >  k^,  we  define  its  Lo  th 
order  Markov  i/oe  as  the  empirical  distribution  on  .V**  x  X 
given  by  Q  :=  €  AT**, a  €  A")  where: 

9..  :=  =  a) 

Let  q,  :■=  define  the  conditional  entropy  of  Q  to 

be: 

with  the  convention  that  q,mlq,  =  0  if  9,  =  0.  For  P  €  U*Li 
we  define  the  conditional  divergence  of  Q  and  P  as: 

Note  that  if  P  €  Pt  for  some  k  <  koiPCals)  \vill  depend  only 
on  the  latest  k  components  of  s  €  AT*’ . 

Let  Q  be  the  set  of  all  sequences  in  X”  with  (common) 
ko'th  order  Marlon'  type  Q.  The  following  bounds  have  been 
proved  in  Gutman  [2]: 

Lemma  1 

ICl  >  n-'*'(n  +  l)-f“'‘+»exp{(n  -  ko)H(Q)},nii6 
\Q\  <  exp{(n  -  ko)HiQ)}. 

Moreover,  for  P  €  I  ^  k  <  ko,  the  following  large  deviation 
’stiroates  hold: 
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Inatkvto  for  Syatorm  Roaoarcb.  Univoraity  of  Maryland.  CoNtfo  Parli.  MD  20742.  USA. 
C.-C.  Liu  ia  with  IBM  Corporation.  Poughliaopaio.  NY.  12M2,  USA. 

P  Narayan  ia  with  tho  Eloctrical  Enfinoorinf  Dopartmont  and  tho  Inatkuta  for  Syatama 
Raaaareh.  Uniworaity  of  Maryland.  CoNoft  Park.  MO  20742.  USA 


P(0)>{n  +  1)  ’■***'(  min  P(s)lexp{-(n  -  fco)L>((?||P)}, 

•€X*» 

P{Q)  <  r**  exp{-(n  -  l:o)D(C?rP)}o 

3.  Main  Results 

Theorem  1  below  specifies  the  rates  of  decay  to  zero  of 
the  probabilities  of  underestimation  and  overestimation  for  the 
following  estimator. 

Given  x"  €  A'",n  >  kn.k„(x”)  =  k  iff: 

(•)  D(0|lP')>e„  VP-ePr, 

(it)  /)(Q||P)  <  tn  for  some  P  €  P* 

where  e„  :=  (r**"*'*  +  and  j  >  0  is  a  constant  that  will 

be  specified  later.  If  neither  condition  above  is  satisfied,  set 

i.I.Y,")  =  ko. 

Theorem  1 

Fix  6  >  0,'y  >0.  Fix  P  €  P*  for  some  1  <  k  <  ko- 

(1)  P(k„rY,")  >  k)  <  r**n-*  n>N(6,~,,ko) 

(u)  P(k,(A,-)<k)< 

exp{-(n  -  koX^min^  r>(Pt.||P)  -  ■>)}  n  >  Ni8.i,ko) 

where  Z)(P».1|P):=  inf p.^p.,D{P'||P).  a 

Rezanrka 

1.  Any  choice  of  >  I  yields  strong  consistency  for  our 
estimator,  i.e.,  kn(.Y")  — *  k  P-a.s.  Clearly  the  overestimation 
probability  can  be  reduced  by  choosing  a  larger  value  of  6  but 
only  at  the  exjrense  of  a  larger  sample  size  JV(6,7,ko)  in  (ii). 

2.  Observe  that  I?(P,.||P)  in  (ii)  is  strictly  positive  since 
the  closure  of  P*.  does  not  intersect  P,  for  k'  <  k. 

Theorem  2  below  establishes  that  the  rate  of  decay  in  The¬ 
orem  1  (ii)  cannot  be  bettered. 

Theorem  2 

Let  0  <  o  <  1  be  given.  Let  k„(-Y")  be  any  estimator 
such  that  for  each  P  G  Pt,l  <  k  <  ko  :  P{k„(.Y")  >  k)  <  o 
for  n  >  JV(o,  ko,  P).  Then  for  any  7  >  0  it  holds  that: 

P{k„{X”)  <k)>  exp{-(n  -  k,)[min*.<*_,  D(Pt.||P)  -  7]} 

for  n  >  iV(o,7,ko,P).  a 
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ABSTRACT 

The  EM  algorithm  is  a  popular  iterative  method  for  finding  the 
maximum  likelihood  estimate  when  the  likelihood  function  is  ei¬ 
ther  non-analytical  or  its  functional  form  is  too  difficult  to  max¬ 
imize  directly.  In  this  paper  we  analyze  the  convergence  proper¬ 
ties  of  the  EM  algorithm.  By  representing  the  E  step  in  a  Taylor 
series  with  remainder  we  obtain  a  derivation  of  region  of  conver¬ 
gence  and  asymptotic  convergence  rates  for  a  specified  complete 
data  space.  These  results  can  help  one  tailor  the  choice  of  com¬ 
plete  data  space  so  as  to  achieve  an  optimal  tradeoff  between  ease 
of  implementation  and  rapid  convergence  of  the  EM  algorithm. 

I.  Main  Results 

Let  8  denote  a  point  in  parameter  space  6  C  R'  which  param¬ 
eterizes  the  density  f{y,8]  of  the  set  of  observations  Y.  Now 
define  a  hypothetical  data  set  X  with  density  g(x\6)  which  is  re¬ 
lated  to  the  actual  data  Y  in  the  sense  that  the  conditional  distri¬ 
bution  dP(y\x,8)  is  functionally  independent  of  8.  Equivalently 
Y  can  be  interpreted  as  the  output  of  a  ^  independent  commu¬ 
nications  channel  C  with  input  X.  X  is  called  the  complete  dots 
and  Y  is  called  the  incomplete  data.  The  EM  algorithm  has  been 
widely  applied  to  iteratively  approximate  the  maximum  likeli¬ 
hood  estimate  8  =  argmax^ln /(Y;8)  (I,  7,  4.  5.  2,  6).  For  an 
initial  point  8°  the  EM  algorithm  produces  a  sequence  of  points 
via  a  recursion  whose  form  is  equivalent  to  |3]: 

FAf  Algorithm: 

=  argmaxg(fl;fl*),  i  =  l.2...  (1) 

$ 

where  Q(8:8)  is  the  difference  between  the  incomplete  data  like¬ 
lihood  function  L(8)  =  ln/(y;d)  and  the  Kullback-Liebler  infor¬ 
mation  divergence; 

Q(d;5)  ^  L(e)-n(8\\i), 

where 

D«>||5)  /  log  £j^|i|lj,(x|y;J)dx.  (2) 

For  any  non-negative  definite  symmetric  matrix  A  define  the 
spectral  radius  p(A)  the  maximum  eigenvalue  of  A.  For  8  €  & 
define  the  Hessian  matrices: 

Q  ^  -V”Q(8j) 

D  "  -V^D{8:8) 

L  =  -Vn(8). 

Define  C  6  as  the  largest  open  ball  with  center  8  such  that 
for  each  J  €  72+ : 

-  /'(I  -  -*■(!-  (I  -  t)i)dt  >  0,  V«  (3) 

Jo 


In  the  sequel  72+  srill  be  identified  as  a  set  of  initial  points  8^ 
for  which  the  EM  algorithm  is  guaranteed  to  converge  which  is 
identified  with  a  region  of  convergence  in  the  following  theorem. 

Theorem  1  Astxme:  i)  the  MLE,  6  =  argmax«  1(8),  oeexrt  in 
the  interior  of  the  parameter  space  &  C  R^;  H)  L(8)  and  Z7(d||J) 
art  tvice  continuously  differentiable  in  8  and  J.  For  (F  an  initial 
point  let  denote  the  seguenee  of  noises  produced  by  the 

EM  algorithm.  Then: 

I.  if(F  e  72+  the  EM  sequence  converges  to  8, 

t.  and  the  asymptotic  convergence  rate  is  linear  unth  root  con¬ 
vergence  factor  p(I  —  Q"'L)  =  p{QD). 

Theorem  1  is  proven  vis  s  simple  application  of  Taylor’s  Theo¬ 
rem  with  remainder.  While  Theorem  1  requires  stronger  assump¬ 
tions  (differentiability  of  L  and  D)  than  the  convergence  results 
stated  in  [8],  our  proof  is  more  elementary  and  sre  come  up  with 
a  region  of  convergence  72+  for  One  can  apply  the  re¬ 

sults  of  Theorem  1  to  compare  different  choices  of  complete  data 
in  terms  of  radius  of  convergence  and  and  speed  of  convergence. 
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1  Introduction 

Consider  the  following  communication  model 

x(i)  =  Z^kh(i-kT)  +  n(t),  (1) 

k 

where  h{-)  is  the  channel  impulse  response;  {ss}  and  T  are  the 
information  symbol  sequence  and  symbol  interval,  respectively. 
TTu  'l/tW”  identification  problem  addressed  in  this  paper  is  the 
identifiahihty  of  a  possibly  nonminimum  phase  channel  h(’)  given 
only  the  observation  process  z(-). 

The  following  assumptions  are  imposed  to  the  above  model: 

(1)  {sa}  is  an  i.i.d.  sequence. 

(2)  The  symbol  interval  T  is  an  integer. 

(3)  The  channel  has  a  finite  impulse  response.  (4)  The  noise  pro¬ 
cess  is  uncorrelated  with  {sa}  '"tith  known  second-order  statistics. 


2  A  Necessary  and  Sufficient  Condi¬ 
tion  in  Frequency  Domain 

Under  the  assumed  condition,  the  observation  process  z(-)  is  a 
cyclostationarv  process.  Different  from  the  stationary  case,  the 

second-order  statistics  contain  the  phase  information  of  the  chan¬ 
nel.  The  identification  of  ff(s)  is  approached  by  identifying  its 
seros  from  those  of  {1^*^*)},  where  {rf*l(z)}  is  obtained  from 
the  observation  spectra.  The  relation  between  and  H{z) 

is  given  by 

r<*>(*)  =  //(s)ff(e^*v  1),  fc  =  1.2,  •  •  • .  (2) 

The  problem  of  channel  identification  is  then  equivalent  to  iden¬ 
tifying  H{s)  Iqr  r<*>(s). 

The  following  theorem  provides  a  necesssuy  and  sufficient  condi¬ 
tion  for  the  channel  identifiability. 

Theorem  1  H{m)  is  uniguely  determined  (identifiable)  by  {rf*^(s)} 
up  to  a  constant  if  and  only  if  H{s)  does  not  have  uniformly  ^ 

-  spaced  zeros.  More  over,  if  the  channel  ts  identifiable, 

Z(H{z))=OZ{T<^\z)),  (3) 

k 

whert  Z(H{z))  §iandB  for  ike  $ei  of  zero$  of  H(i). 


3  A  Necessary  and  Sufficient  Condi¬ 
tion  in  Time  Domain 


A  time-domain  necessary  and  sufficient  condition  of  channel  iden¬ 
tifiability  is  obtained  by  using  a  vector  representation  of  the  base¬ 
band  model 

x(n)  =  lx[nT)x{nT  -|- 1) . . .  i(nT  +  T-  1)1‘.  (4) 

We  then  have,  from  the  channel  model, 

=  XI  +  Dn.  (5) 

ik 

where 


hs  =  [/i(tr)MitT-t-l).../i(kT-|-T-l)]‘,  (6) 

n*  =  [n(kT)n(kT  +  l)...n{kT  +  T-l)Y.  (7) 


With  the  above  formulation,  we  have  the  following  theorem. 


Theorem  2  The  channel  impulse  response  can  be  determined 
uniquely  up  to  a  constant  if  and  only  if  there  exists  an  integer 
d  such  that  matrix  has  a  full  column  rank,  where 


hw  = 


/ho  hi  ...  ht  0 

I  0  ho  hi  ...  ht 


Vo  ...  0 


hi. 


(8) 


The  proof  of  the  above  theorem  establishes  the  connection  be¬ 
tween  the  rank  condition  and  the  condition  involving  the  location 
of  the  seros.  The  sufficient  part  of  this  theorem  is  equivalent  to 
the  one  [1]. 
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The  A:th-order  joint  distribution  for  an  ergodic  finite- 
alphabet  process  can  be  estimated  from  a  sample  path  of 
length  n  by  sliding  a  window  of  length  k  along  the  sam¬ 
ple  path  and  counting  frequencies  of  ib-blocks.  If  ib  is  fixed 
the  procedure  is  consistent  in  that  the  resulting  empiricsJ 
block  distribution  wiU  almost  surely  converge  to  the  true 
distribution  of  k  blocks  as  n  — >  oo,  a  fact  guaranteed  by 
the  ergodic  theorem.  The  consistency  of  such  estimates 
is  important  when  using  training  sequences,  that  is,  finite 
san'ple  paths,  to  design  engineering  systems.  The  empiri¬ 
cal  b-block  distribution  for  a  training  sequence  is  used  as 
the  basis  for  design,  after  which  the  system  is  run  on  other, 
independently  drawn  sample  paths.  There  are  some  situa¬ 
tions,  such  as  data  compression,  where  it  is  good  to  make 
the  block  length  as  long  as  possible.  Thus  it  would  be  de¬ 
sirable  to  have  consistency  results  for  the  case  when  the 
block  length  function  k  =  k(Ti)  grows  as  rapidly  as  pos¬ 
sible,  as  a  function  of  sample  path  length  n.  This  is  the 
problem  addressed  in  this  paper. 

A  sequence  {fc(n)}  will  be  said  to  be  admissible  for 
a  given  ergodic  process  p  if  the  variational  distance  be¬ 
tween  the  true  distribution  and  the  empirical  distribution 
of  fc(n)- blocks  converges  almost  surely  to  0  as  n  -♦  oo.  Ev¬ 
ery  ergodic  process  has  an  admissible  sequence  such  that 
lim„  k{n)  =  oo,  by  the  ergodic  theorem,  and,  for  any  se¬ 
quence  k{n)  — *  oo  there  is  an  ergodic  measure  for  which 
H'(n)}  is  not  admissible. 

Entropy  plays  a  role  in  this  problem,  because  if  lt(n)  > 

(1  -f  t)(logn)///,  then  the  empirical  fc-block  distribution 
cannot  be  close  to  the  true  distribution,  for,  by  the  en¬ 
tropy  theorem,  most  of  the  probabibty  is  concentrated  on 
a  set  of  fc-blocks  of  cardinality  Thus  the  interest¬ 

ing  question  is  whether  there  are  any  processes  for  which 
consistent  estimation  is  possible  if  k(n)  ~  (1  -  ({\ogn)/H. 

It  is  shown  in  this  paper  that  the  answer  is  yes  for  the 
class  of  Markov  processes  as  well  as  for  somewhat  larger 
classes,  such  as  the  class  of  finite  state  processes,  and  in 
a  slightly  weaker  form  for  the  class  of  processes  for  which 
past  and  future  become  asymptotically  independent  in  the 
weak  Bernoulli  sense.  The  proofs  depend  on  an  extension 
of  the  Sanov-Hoeffding  large  deviations  bound,  together 
with  an  inequality  due  to  Pinsker. 

For  the  class  of  functions  of  Markov  chains,  this  work 
sharpens  prior  results  obtained  for  more  general  classes 
of  processes  by  Ornstein  and  Weiss  and  by  Ornstein  and 
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Shields  which  used  the  d-distance  rather  than  the  varia¬ 
tional  distance. 

Extensions  and  applications  of  these  results  will  also  be 
discussed. 
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Abstract 

A  new  lappex  bound  u  inirodnetd  on  ibc  efiimitton  ot  ib«  ptobn- 
bility  density  function  throufh  spectral  analysis.  Tbc  upper  bound  is 
shown  to  decrease  steadly  as  the  modulating  index  is  increased,  iat  the 
Gaussian  case. 


Summary 


Estimation  of  the  probability  density  function  (pdf)  of  stochastic  process  is 
commonly  based  on  a  time  series  approach  [1].  This  is  done  by  measurement 
of  the  time  spent  by  the  signal  between  two  specified  levels  ot  through  a 
pulse  counting  process,  for  discrete  signals.  This  usually  leads  to  biased  and 
inconsistent  estimates,  and  to  mean  square  errors  that  depend  on  the  pdf 
itself  [2].  It  is  a  common  practice  to  assume  the  stationarity  and  ergodicity 
of  the  random  process  into  analysis  [3]. 

The  main  purpose  of  this  paper  is  to  present  a  new  bound  on  the  estima. 
tion  of  the  probability  density  function  of  random  signals,  using  Woodward’s 
theorem,  correlation  techniques  and  spectral  analysis  [4]  [6].  The  proposed 
method  is  based  on  the  spectral  analysis  of  the  random  process. 

Woodward’s  theorem  asserts  that  the  spectrum  of  a  high-indez  frequency 
modulated  waveform  can  be  approximated  by  the  probability  distribution  of 
its  iiutantaneons  frequency  deviation  (5).  A  new  proof  of  the  theorem  was 
developed  previously,  which  gave  the  following  result  for  the  power  spectrum 
density  (PSD)  of  the  modulated  signal  [6]. 


55(»)  =  g[psr(^^)+psr(=^)l.  (1) 

where  the  constant  parameters  A,  ui.  snd  D  represent  respectively  the  carrier 
amplitude,  frequency  (rd/*)  and  frequency  deviation  index.  The  signal  whose 
pdf  (pjr(m))one  intends  to  analyse  is  represented  by  m(f),  here  considered 
a  sero  mean  random  stationary  proceu,  limited  in  frequency  to  vn.  The 
phase  of  the  carrier  ^  is  random,  uniformly  distributed  in  the  range  (0,2x) 
and  statistically  independent  of  m(l). 

The  difference  between  the  estimate  of  the  PSD  function  and  the  actual 
PSD  is  the  estimation  error  Es-  An  upper  bound  for  this  error  is  evaluated 
below  and  is  shown  to  decrease  with  the  an  increase  in  the  modulation  index 
p.  The  approximation  error  is  given  by 


m 


where  Ss(v)  represents  the  actital  spectrum  and  Ss(v)  stands  for  the  approx¬ 
imation. 

Considering  the  limiting  case  (r  =  r/Pviy  =  \/Pty),  an  upper  bound 
on  the  normalised  error  can  be  determined.  Substituting  the  expressions  for 
S5(ui)  and  ^5(01)  into  equation  2,  evaluating  the  expectancies  at  «<  =  0  and 
usiag  the  following  inequality  [7] 


r>£[(»'{<))’l  > 
> 


E((.(f  +  T)-.(0)>1 


(S) 


leads  to 


(4) 


wheta  0(s)  is  the  Q-fbaction,  dsiasd  by 

The  above  exptessioa  is  a  very  tight  boand  and  shows  the  error  dependency 
on  the  modnlatiag  index  0.  The  efBcieacy  of  the  estimation  used  is  assured 
becanse  the  variance  goes  to  sero.  The  estimation  always  gets  better  as  the 
modnlatiag  index  is  increased,  which  implies  a  decrease  in  frequency  oe  an 
increase  in  the  power  of  the  signal  [S].  This  also  implies  the  consistcacy  of  the 
method.  AD  the  rdevaat  information  is  available  for  the  estimation,  giving 
snillcieacy  to  the  estimator. 

A  digital  compnler  implementation  of  the  method  was  performed  throagh 
contract  No.  C.NB.085.16  with  EMBRATEL,  and  has  been  tested  in  practice 
with  good  results  [9]. 
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A  Cramer-Rao  Type  Lower  Bound  for  ESstimators  Satisfying  a 

Bias  Constraint* 

Alfred  Hero 

In  this  paper  we  give  a  Cramer-Rao  (CR)  type  lower  bound  on  esti¬ 
mator  covariance  which  applies  to  any  estimator  whose  bias  gradient  lies 
within  a  user  specified  ellipsoidal  region  of  parameter  space.  In  addition 
to  providing  a  useful  lower  bound  which  is  insensitive  to  small  unknown 
estimator  biases,  the  rate  of  change  of  the  new  bound  provides  a  quantita^ 
tive  bias  “sensitivity  index”  for  the  conventional  bias-dependent  CR  bound. 
We  give  an  analytical  form  for  this  sensitivity  index  which  indicates  that 
small  estimator  biases  can  make  the  new  bound  significantly  less  than  the 
unbiased  version  of  the  CR  bound  when  there  exist  important  but  difficult- 
to-estimate  nuisance  parameters.  This  implies  that  the  application  of  the 
CR  bound  is  unreliable  for  this  situation  due  to  severe  bias  sensitivity.  As  a 
practical  illustration  of  these  results,  we  consider  the  problem  of  estimating 
elements  of  the  2x2  covariance  matrix  associated  with  a  pair  of  indepen¬ 
dent,  identically  distributed,  zero-mean  Gaussian  random  sequences. 
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NON-LINEAR,  NON-BINARY  CYCLIC  GROUP  CODES 

G.  Solomon 
Los  Angeles,  CA 


Abstract 

New  cyclic  group  codes  of  length  2"*  -1  over  (m  —  j)-bit 
symbols  are  introduced.  These  codes  may  be  systematically 
encoded  and  decoded  algebraically.  The  code  rates  are  very 
close  to  RS  codes  and  are  much  better  than  BCH  codes  (a  for¬ 
mer  alternative).  The  (m— j)-binary  tuples  are  identified  with 
a  sub-group  of  the  binary  m-tuples  which  represent  the  field 
GF(2"*).  Encoding  is  systematic  and  involves  a  two  stage  pro¬ 
cedure,  the  usual  linear  feedback  register  (using  the  division  or 
check  polynomial),  and  a  small  table  look  up.  For  low  rates,  a 
second  shift  register  encoding  operation  may  be  invoked.  De¬ 
coding  uses  the  Reed-Solomon  error  correcting  procedures  for 
the  m-tuple  alphabet,  i.e.,  the  field  elements  GF(2”'). 

SUMMARY 

Group  codes  of  lengths  up  to  2"'  over  binary  (m  —  1)  tu¬ 
ples  are  first  introduced  and  are  shown  to  be  cyclic  and  then 

systematically  encodable.  These  (m  —  1  )-tuples  are  identified 
with  an  additive  subgroup  of  the  field  GF(2'").  These  codes  are 
not  linear.  That  is,  a  codeword  does  not  admit  multiplication 
by  a  GF(2"‘)  field  element  to  yield  another  codeword. 

Consider  the  field  GF(2'")  along  with  a  primitive  element  0 
which  generates  the  n  =  (2”  —  1 )  roots  of  unity.  In  addition, 
0  is  chosen  with  the  following  properties;  1 )  m  odd:  Tr  /3'  =  0 
for  1  <  i  <  m  -  1,  where  Tr  denotes  the  linear  field  operator 
trace.  Tr ^  + -H  •••  + /?*"''.  So  Tr  ^  €  GF(2)  ,  Tr 
0^  =  Tt  0  ,Tt  cx^  =  Tr  \/cx,  for  c,  *  €  GF(2'").  2)  m  even: 
Tr  =  0  for  0  <  i  <  m  —  1  except  for  a  single  odd  integer  p  , 
p  <  m,  and  Tr  .''i’’  =  1 . 

The  following  are  polynomials  for  0  which  satisfy  the  con¬ 
ditions  1)  and  2)  above  for  3  <  m  <  12. 


m 

Polynomial  for  0 

explanation 

3 

3  1  0 

(x®  -1-  I  -f-  1) 

4 

4  1  0 

5 

5  3  0 

6 

6  1  0 

(Tr/3»  =  1) 

7 

7  3  0 

8 

8  4  3  2  0 

(Tr^‘  =  l) 

9 

9  5  0 

10 

10  3  0 

(Tr  00  =  1) 

11 

1190 

12 

12  6  4  1  0 

(Tr  =  1) 

Codes  of  length  greater  than  4096  are  rarely  invoked  in 
present  day  block  coding  techniques.  Do  these  properties  ex- 

tend  beyond  m  =  12? 

An  element  c  eGF(2'")  may  be  represented  by  c  =  Ellg’ 

One  may  identify  Tr  c  by  its  binary  representation  (cj);  0  < 
t  <  m  —  1  and  single  out  Co  for  m  odd,  and  Cp  for  m  even. 
Thus  the  binary  value  Tr  c  is  determined  by  only  the  trace  one 
position  (0  or  p)  in  its  binary  m-bit  representation.  Choose  an 
(n,  it;  d)  Reed-Solomon  code  over  GF(2’")  so  that  the  code¬ 
words  are  values  of  sets  of  polynomials  P{x)  with  coefficients 
in  GF(2”')  of  fixed  highest  degree  (n  -  d)  or  (n  -  d  -  1).  A 
codeword  a  =  (a^)  is  represented  by  the  values  of  a  polynomial 
^.(x)  so  that  aj  =  P»(0’),  0  <  j  <  n  -  1. 


Restrict  /’.(i)  for  all  codewords  a  to  a  (m  -  1)  order  sub¬ 
group  of  GF(2’")  by  stipulating  that  Tr  P{x)  =  0  for  i  6GF(2“). 
(P(ar)  as  writtten  here  is  generic  for  all  Ptii^))'  The  codes  thus 
generated  are  cyclic  group  codes  over  (m  —  l)-bit  symbols  and 
are  systematically  encodable  for  codes  meeting  the  conditions 
in  the  main  theorem. 

Examples: 

1.  Take  the  RS  code  A  of  dimension  5  over  GF(2™),  a  g 
A,  a  =  (o<),  a,  =  P^(0^).  The  polynomials  P,(x)  are  of 
degree  4  with  Tr  Pa(x)  =  0,  for  all  x  €  GF(2’").  For  a 
general  P(x),  dropping  the  subscript  a,  P(x)  =  A-t-  Bx  + 
Cx^+D^+Ex*  ;  A,  B,  C,D,Ee  GF(2"‘).  The  conditions 
that  Tr  Pal^f)  =  0  gives  Tr  A  =  0,  B*+C^+E  =  0,  D  =  0. 
This  code  has  binary  dimension  (m  -  1)  -I-  2m. 

For  m  =  3,  we  get  binary  dimension  8  or  dimension  4  over 
2-tuples,  i.e.,  a  (7,  4;  3)  code  over  binary  doubles.  This  is 
a  reduction  from  the  (7,  5;  3)  RS  code  over  binary  triples! 

There  exist  no  integer  dimension  over  (m  -  1  )-tuples  for 
m  >  3  since  (m  -  1 )  -1-  2"*  is  not  a  multiple  of  (m  -  1). 

2.  Take  a  RS  code  of  dimension  11  over  GF(16)  but  choose 
as  your  Mattson-Solomon  (MrS)  set  the  polynomials  P(x) 
of  degree  1 1 ,  setting  the  constant  term  equal  to  zero. 

P{*)  =  E,”i  CiX';  Tr  P(x)  =  0  leads  to 

Cl+C2  +  C4  +  Cg=0 

4  +  4  +  =  0 

Cs  +  Cg  +  cJo  -I-  Cjo  =  0 
cj,  +  C7  =  0 

The  number  of  binary  dimensions  isl2-f-8-f-6-l-4  = 
30  which  is  dimension  10  over  binary  triples.  Thus  the 
(15,  11;  5)  RS  code  over  GF(16)  is  transformed  into  the 
non-systematic  (15,  10;  5)  code  over  trace  zero  elements 
of  GF(16). 

3.  Similarly  the  RS  (15,  7;  9)  code  over  GF(16)  using  polyno¬ 
mials  of  degree  6  from  0  to  6,  under  analogous  techniques, 
gives  the  relations  Tr  Co  =  0;  cJ-l-Cj-f  C4  =  0;  ce  +  4~ 
0;  C5  =  0. 

Binary  dimension  count  is3-h8-i-4  =  15.  This  yields  a 
(15,  5;  9)  code  over  triples. 

Compare  this  to 

•  (15,  5;  11)  RS  over  4-tuples, 

•  (15,  5;  7)  BCH  over  GF(8)  and  GF(2). 

•  (15,  4;  10)  BCH  over  GF(4).  (doubles). 

These  non-systematic  codes  are  cyclic.  The  extension  to 
(n»  —  j)-bit  symbols  and  the  systematic  construction  of  these 
codes  can  be  found  in  ”  Nonlinear,  Nonbinary  Cyclic  Codes  by 
G.  Solomon  NA.SA  Code  310-10-63-53-00  TDA  Progress  Report 
42-108  Jet  Propulsion  Laboratory  October- December  1991. 

'This  work  wa>  performed  by  the  author  while  acting  as  a  consultant  to 
the  Jet  Propulsion  Laboratory,  California  Institute  of  Technology,  under 
contract  to  the  National  Aeronautics  and  Space  Administration 
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Classification  of  Cosets 
of  the  Reed-MuUer  Code  R(m  -  3,m) 
Xiang-dong  Hou 

Department  of  Mathematici  and  StatUtiet 
Wright  State  Univertitg 
Dayton,  Ohio  4S435 

The  covering  radius  of  the  (m-3)rd  order  Reed-Muller  code 
i2(m  —  3,m)  of  length  2'"  has  been  known.  This  talk  aims  at 
a  complete  classification  and  further  properties  of  the  cosets  of 
R{m  ~  3,m). 

The  general  affine  group  GA(rn,  2)  is  an  automorphism  group 
of  R(m  —  3,m),  and  is  the  full  automorphism  group  when  m  >  4. 
Hence  Gi4(m,2)  acts  on  the  set  C  of  all  co^ets  of  R(m  -  3,m). 
The  cosets  of  even  weight  in  C  correspond  '  o  m  x  m  symmetric 
matrices  over  GF(2),  and  their  G£(m,  2)  orbits  correspond  to 
the  congruence  classes  of  m  x  m  matrices  over  GF(2).  The  same 
thing  happens  with  respect  to  the  coseis  of  odd  weight  in  C. 
Using  the  well-known  classification  of  symmetric  matrices  over 
GF(2)  under  congruence,  we  get  the  calssification  of  C  under 
the  action  of  GL{m,  2).  The  classification  of  C  under  the  action 
of  GA{rn,2)  follows  immediately.  Representatives  and  sizes  of 
the  Gj4(m,  2)-orbits  in  C  are  given.  The  minimal  weights  of  the 
cosets  in  C  are  determined. 

We  also  identify  all  the  orphan  cosets  of  R(m  -  3,m).  It 
turns  out  that  zJl  the  orphans  of  Rfrn  —  3,  m)  are  0-covered,  i.e., 
for  any  orphan  C  of  R{m  —  3,m)  and  any  coordinate  position, 
there  is  a  coset  leader  of  C  whose  coordinate  at  the  given  position 
is  0.  This  implies  that  R{m  -  3,m)  is  normal. 

Finally,  we  turn  to  the  weight  distributions  of  the  cosets  in 
C.  For  any  C  €  CF(2)*  ,  we  derive  a  general  recursive  formula 
for  computing  the  weight  distribution  of  C.  The  recursion  starts 
at  the  minimal  wright  (i  of  C.  When  w  >  is  not  far  away 
from  fi,  the  formula  gives  the  number  of  vectors  of  weight  ui  in  C 
rather  easily  in  terms  of  certain  functions  |y4((C)|  of  C.  |j4((C)| 
is  difficult  to  compute  for  general  C.  However,  when  C  is  one  of 
those  representatives  of  the  Gvl(m,2)-orbits  in  C,  we  are  able  to 
give  explicit  formulas  for  |y4((C)|. 
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IDEMPOTENTS  AND  MINIMUM  WEIGHTS  OF 
PRIME  POWER  LENGTH  CYCLIC  CODES 
OVER  ARBITRARY  FIELDS 

Vwessa  Job,  Marymount  University,  Arlington,  VA  22207 

Let  p  be  a  prime  and  let  9  be  a  prime  power  relatively 
prime  to  p.  Let  z  be  the  greatest  integer  such  that  p'Kfl'  —  1) 
where  t  is  the  order  q  modulo  p.  Assume  that  p  and  q 
have  been  chosen  so  that  z  =  1.  (Note  that  is  very  unusual 
to  have  z  >  1.)  We  give  a  characterization  of  idempotents 
of  length  p'"+'  cyclic  codes  over  GF{q)  in  terms  of  the  idem¬ 
potents  of  length  p  cyclic  codes  over  GF{q).  We  define  two 
classes  of  length  p’"'*' '  cyclic  codes,  the  repeated  p  codes  and 
the  expanded  p"  codes,  which  are  derived  from  length  p  and 
p"  cyclic  codes,  respectively,  and  give  the  idempotents  of  these 
codes  in  terms  of  the  idempotents  of  the  codes  from  which  they 
were  derived.  We  also  give  the  weight  enumerators  of  codes  in 
these  classes  as  a  function  of  the  weight  enumerators  of  the 
codes  from  which  they  were  derived.  Finally,  we  show  that 
every  length  p’"'*''  code  can  be  uniquely  expressed  as  a  sum 
of  a  repeated  p  code  Ci  and  an  expanded  p™  code  Cj  and 
show  that  the  sum  must  have  minimum  weight  less  than  or 
equal  to  min(p'"di,d2)  where  d]  is  the  minimum  weight  of  of 
the  length  p  code  from  which  C\  was  derived  and  dj  is  the 
minimum  weight  of  the  length  p"  code  from  which  C2  was 
derived.  Using  these  results,  we  give  an  algorithm  for  con¬ 
structing  idempotents  for  prime  power  length  duadic,  triadic, 
and  polyadic  codes,  generalizations  of  quadratic  residue  codes 
to  nonprime  lengths  n  and  dimensions  other  (r»  —  l)/2  and 
(fi  1)/2.  We  show  that  the  minimum  weight  of  prime  power 
length  polyadic  codes  is  unlikely  to  be  greatest  possible,  distin¬ 
guishing  them  from  polyadic  codes  of  square  free  length,  which 
frequently  have  greatest  possible  or  greatest  possible  known 
minimum  weight  for  codes  of  their  length  and  dimension. 
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Constructing  Reed-MuUer  Codes  from 
Reed-Solomon  Codes  over  GF{q) 

Frank  R.  Kschischang 


1  Summary 

The  (9,  it]  extended  Reed-Solomon  codes  over  GF(q)  are  nested, 
that  is, 

[9,9]  3  (9.9  -  1]  3  3  (9, 1]  3  (9,0]  =  {0}. 

This  means  that  the  [9,  fc  —  1]  code  is  a  subgroup  of  the  [9,  fc)  code 
and  hence  the  [9,  A:]  code  may  be  partitioned  into  the  9  cosets  of 
the  [9,  fc—  1]  code.  This  code,  in  turn,  may  be  partitioned  into  the 
9  cosets  of  the  [9,  fc  —  2]  code,  etc.,  thus  forming  the  set  partition 
chain  l9,9]/[9,9—  1]/ •  •  • /[9i  l]/[9,0].  Since  these  codes  are  all 
maximum  distance  separable  or  MDS  (the  minimum  Hamming 
distance  of  the  [9,  fc]  code  is  9  —  fc  -I-  1),  the  intrasubset  distances 
form  the  sequence  {1,2,3, ...  ,9,00},  where  00  is  used  to  denote 
the  intrasubset  distance  of  a  set  with  one  element  (a  singleton). 

Let  9i  be  a  nonzero  codeword  in  the  [9, 1]  code.  Then  gi  is  a 
generator  for  the  code.  Let  gj  be  a  nonzero  codeword  in  the  [9, 2] 
code  that  is  not  in  the  [9, 1]  code,  i.e.,  an  element  of  the  relative 
complement  of  [9,1]  in  (9,2].  Then  {91,92}  generates  the  [9,2] 
code.  Proceeding  in  this  way  we  can  obtain  a  universal  generator 
matrix 

r  1 


L  ffi  J 

the  last  fc  rows  of  which  generate  the  [9,fc]  code. 

The  basic  muUiltuel  (sometimes  called  generalized  concate¬ 
nated  or  hierarchical)  code  construction  technique  (see,  c  g.,  [1,2)) 
is  based  on  precisely  the  type  of  set  partitioning  described  above. 
The  construction  combines  9  component  codes  of  block  length  n  to 
obtain  (in  this  case)  a  code  of  length  nq.  Although  not  necessary 
for  the  construction,  we  consider  here  only  linear  codes.  Denoting 
the  parameters  of  the  component  codes  by  |n,fci,c/||,  [n,fc3,<f2l> 
....  [n,  fc,,  d,|,  the  multilevel  construction  combines  these  codes  to 
obtain  an 

[nq,  ki  +  kj  +  ■  ■  ■  +  k,,d  =  min(d|,2d2.  Sdj . 9<^»)|- 

code.  Wu  and  Costello  used  this  construction  method  in  [3)  to 
obtain  new  codes  over  GF(q).  In  “code  formula”  form,  we  have 

C  =  |n,fc,,  Ji]  0  9,-1-  [n,fc2,d2|  0  9?-i  +  •  ■  •  +  [n, fc,,<^,l  0  9i 

where  “0”  denotes  a  Kronecker  product. 

For  example,  when  9  =  2,  we  lake  91  =  11  and  92  =  01. 
Applying  the  multilevel  construction,  we  combine  two  codes  1'  = 
(n.fci.dij  and  U  =  [n,fc2,</2l  to  obtain 

C  =  [n.fci.di]  0  01  -(-  [n,fc2,d2j  0  1 1- 

If  V  has  generator  matrix  Gv  and  U  has  generator  matrix  Gi;, 
then  C  has  generator  matrix 

^  _f  0  Gy] 


’’The  author  is  with  the  Department  of  Electrical  and  Computer  Engineer¬ 
ing.  University  of  Toronto,  Toronto,  Ontario,  CANADA  MSS  IA4,  tel:  (416) 
978-0461,  fax;  (416)  978-7423,  e-mail:  franktcoaa.toronto.sdo 


By  the  basic  properties  of  the  multilevel  construction,  C  has  min¬ 
imum  distance  d  =  min(<ii,2d2).  Clearly  this  is  the  well-known 
(U\U  -f  V)  construction  [4|. 

Similarly,  the  first  three  rows  of  Pascal’s  triangle,  reduced 
modulo  3,  form  a  universal  generator  matrix  for  a  family  of  nested 
ternary  MDS  codes  of  block  length  3.  Applying  the  basic  multi¬ 
level  construction  method  results  in  a  ternary  “'(U\2U  -k  V\U  -)- 
V  -f  VF)”  code  construction  method  [5],  in  which  C  =  U  ®  121  -t- 
V0  110 -I- IF  0100. 

This  construction  is  easily  extended  to  GF{p)  for  any  prime 
p.  Massey  et  al.  have  shown  in  [6]  that  the  first  p  rows  of  Pascal’s 
triangle  reduced  modulo  p  form  a  universal  generator  matrix  for  a 
family  of  nested  MDS  codes  of  block  length  p  over  GF(p).  Thus, 
the  basic  multilevel  construction  method  may  be  applied  to  this 
case. 

For  arbitrary  fields  GF(q),  we  construct  9-ary  “Reed-Muller” 
codes  as  follows.  Beginning  with  the  bi-infinite  partition  chain 

•■•/[1,1]/[1,1]/[1,1]/[1,0]/[1,0]/[1,0]/--- 

we  apply  the  multilevel  construction  technique  taking  as  compo¬ 
nent  codes  every  sequence  of  9  consecutive  codes  in  the  partition 
chain.  From  this  we  obtain  the  bi-infinite  partition  chain 

•  •  •  /I9. '?!/['?.  9l/[v-9  -  1)/  •  •  •  /[?•  l]/k0]/[9,0|/  •  •  • . 

We  then  apply  the  multilevel  construction  technique  to  this  chain 
of  codes,  resulting  in  a  chain  of  codes  of  block  length  9’.  This 
process  may  be  repeated  indefinitely,  resulting  in  a  family  of  block 
codes  having  block  lengths  ..hat  are  integer  p>owers  of  9. 

In  our  work  we  derive  formulas  for  the  dimension  and  mini¬ 
mum  distance  of  these  codes,  investigate  their  duality  properties, 
and  show  that  these  codes  are  natural  analogs  of  the  binary  Reed- 
Muller  codes. 
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ON  THE  APPARENT  DUALITY  OF  THE  KERDOCK  AND  PREPARATA  CODES 


Roger  Hammons  and  P.  Vijay  Kumar 


Abstract  The  Kerdock  and  Preparata  codes  are  something 
of  an  enigma  in  coding  theory  since  they  are  both  Hamming 
distance  invariant  and  have  weight  enumerators  that  are  dual 
under  the  MacWilliams  transform  just  as  if  they  were  dual 
linear  codes.  In  this  paper,  we  explain,  by  constructing  in 
a  natural  way  a  Preparata-like  code  from  the  Kerdock 
code  K,  why  the  existence  of  a  distance- invariant  code  with 
weight  distribution  that  is  the  McWilliams  transform  of  that 
of  the  Kerdock  code  is  only  to  be  expected  The  construction 
involves  quaternary  codes  over  the  ring  Z4  of  integers  mod¬ 
ulo  4.  We  exhibit  a  quaternary  code  Q  and  its  quaternary 
dual  which,  under  the  Gray  map,  give  rise  to  the  Ker¬ 
dock  code  K  and  Preparata-like  code  respectively.  The 
code  is  identical  in  weight  and  distance  distribution  to 
the  Preparata  code  The  linearity  of  Q  and  ensures  that 
K  and  are  distance  invariant,  while  their  duality  as  qua¬ 
ternary  codes  guarantees  that  K  and  Pi^  have  dual  weight 
distributions. 
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SUMMARY 

Recently,  a  family  of  nearly  optimal  four-phase  sequences  of 
period  M  =  2'  —  1,  r  odd,  with  alphabet  {l,j, -1,— j},  J  = 
\/—l,  was  discovered  first  by  Sole  [1]  and  later  independently 
by  Boztaq,  Hammons,  and  Kumar  (2).  After  replacing  each 

complex  fourth  root-of-unity  j*  by  its  exponent  a  €  {0, 1,2,3), 
this  family  may  be  viewed  as  a  linear  quaternary  code  over 
the  ring  Z4  of  integers  modulo  4.  Since  the  family  has  low 
correlation  values,  it  also  posses.ses  large  minimum  Euclidean 
distance  and  thus  the  potential  for  excellent  error-correcting 
capability. 

An  analysis  [2]  of  the  correlation  properties  of  the  four-phase 
sequences  led  us  to  consider  the  2-adic  (i.e.,  base  2)  expansions 
of  the  quaternary  codewords.  Interestingly,  these  bore  a  strik¬ 
ing  resemblance  to  the  original  expression  [3]  for  the  nonlinear 
binary  Kerdock  code.  A  second  connection  with  the  Kerdock 
code  arose  during  attempts  to  construct  good  binary  codes  from 
the  four-phase  sequence  family  using  the  Gray  map.  This  was  a 
logical  step  to  pursue  as  the  Gray  map  translates  a  quaternary 
code  with  large  minimum  Euclidean  distance  into  a  binary  code 
of  twice  the  length  having  large  minimum  Hamming  distance. 
The  codes  that  resulted  were  nonlinear  and  had  the  same  pa¬ 
rameters  as  shortened  versions  of  the  Kerdock  code. 

In  exploring  these  connections,  it  was  discovered  that  the 
original  quaternary  code  could  be  enlarged  in  a  natural  way,  as 
shown  in  Figure  1,  to  a  linear  quaternary  code  Q  whose  image 
under  the  Gray  map  is  precisely  the  Kerdock  code.  It  was  only 
natural  to  consider  whether  the  interesting  link  between  the 
Kerdock  code  and  a  linear  quaternary  code  could  also  be  used 
to  explain  the  apparent  duality  of  the  Kerdock  and  Preparata 
codes. 

The  new  perspective  does  indeed  provide  an  explanation,  al¬ 
though  not  the  one  that  might  first  be  suspected.  We  show 
that  the  binary  images  G{C)  and  Q{C'^)  under  the  Gray  map  of 
a  linear  quaternary  code  C  and  its  Z.-dual  are  always  Hamming 
distance  invarismt.  Furthermore,  these  binary  codes  have  the 
property  that  their  weight  distributions  are  always  dual  under 
the  MacWilliams  transform.  As  a  consequence,  the  Kerdock 
code  possesses  a  natural  “quaternary-dual”  code  Vi  = 
identical  in  size,  weight,  and  distance  to  the  extended  Prepa¬ 
rata  code.  Although  the  Preparata  and  Preparata-like  ('Pl) 


FIGURE  1.  QUATERNARY  CONNECTIONS 


codes  have  similar  finite  field  transform  descriptions,  they  are 
in  general  not  the  same. 

Interestingly,  at  length  16,  the  Preparata  and  Preparata- 
like  codes  do  coincide.  In  fact,  the  Kerdock  code,  the  ex¬ 
tended  Preparata  code,  and  the  Preparata-like  code  all  coincide 
with  the  Nordstrom- Robinson  code  Thus,  the  Nordstrom- 
Robinson  code  can  be  generalized  in  one  way  to  get  the  ex¬ 
tended  Preparata  codes,  in  another  way  to  get  the  Kerdock 
codes,  and  in  yet  another  way  to  get  the  Preparata-like  codes! 

From  the  standpoint  of  decoding,  it  is  not  necessary  to  dis¬ 
tinguish  between  the  binary  codes  and  their  quaternary  par¬ 
ents.  An  important  advantage  in  working  in  the  Z.-domain, 
where  the  codes  are  Unear,  is  that  it  is  meaningful  to  speak  of 
syndromes.  Moreover,  the  codes  Q  and  Q-^  are  Z4-analogs  of 
the  binary  first-order  Reed-Muller  code  RM(l,r)  and  its  dual 
RM{r  —  2,r).  This  connection  makes  decoding  of  the  Kerdock 
and  Preparata  codes,  at  least  conceptually,  easier. 
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Normal  and  Abnormal  Codes 


Tuvi  Etzion*  Gadi  Greenberg^  Iiro  S.  Honkala* 


Lot  of  research  in  the  area  of  covering  radius  is  on 
the  normality  of  codes.  The  main  reason  is  that  by 
using  the  amalgamated  direct  sum  [1]  construction 
one  can  generate  from  normal  codes  sparse  covering 
codes  with  larger  covering  radius.  An  (n,d)R  code  C 
is  a  code  of  length  n,  covering  radius  at  most  R,  and 
minimum  distance  at  least  d.  An  interesting  question 
in  this  context  is  to  determine  which  codes  are  nor¬ 
mal  and  which  codes  are  abnormal.  One  important 
factor  is  the  ratio  between  the  covering  radius  of  the 
code  and  its  minimum  distance,  van  Wee  [5]  proved 
that  all  (n,  2R)R  codes  and  all  (n,  2R+ 1) A  codes  are 
normal.  Hou  [3]  proved  that  all  linear  quasi-perfect 
codes  are  normal.  These  results  are  strengthen  with 
the  following  theorem. 

Theorem  1.  If  C  is  an  {n,2R  —  l)R  code,  where 
R  does  not  divide  n,  then  C  is  normal  and  all  its 
coordinates  are  acceptable. 

All  the  abnormal  codes  which  are  known  [2], [4], [5] 
have  minimum  distance  1.  Three  constructions  (A, 
B,  and  C)  for  generation  of  abnormal  codes  are  given. 
The  constructions  that  we  use  are  modifications  of  the 
constructions  of  Frankl  [4]  and  van  Wee  [5].  The  con¬ 
structions  differ  in  the  structure  of  the  codes  which 
they  use. 

By  applications  of  Ck>nstruction  A  we  show  that 
for  most  lengths  there  exists  abnormal  (n,  R)R  codes, 
R<=  6. 

Theorem  2.  By  applying  Construction  B  on  an 
(n,d)R  code  we  obtain  an  abnormal  (n,d  -  1)A+  1 
code. 

Theorem  3.  For  each  R  >=  1,  there  exists  am  no 
such  that  for  each  n  >=  no  there  exists  an  (n,  M,  R  — 
l)f2  code. 
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By  Theorem  2  and  the  results  of  Vleduts  and  Sko- 
robogatov  [6]  we  have 

Theorem  4.  For  each  t  there  exists  an  mo  such 
that  for  all  m  >=  mo  there  exist  abnormal  (2”*  — 
1, 2f)2f  and  (2”*,  2t  -|-  l)2f  -I- 1  codes. 

By  applying  Construction  C  on  the  extended  Ham¬ 
ming  code,  the  punctured  Preparata  code,  and  the 
Preparata  code,  we  obtain  that  there  exists  an  no 
such  that  for  each  n  >=  no,  there  exist  abnormal 
(2”, 3)2,  (2=*"  -  1,4)3,  and  (2’“, 5)4  codes,  respec¬ 
tively.  One  consequence  is  that  it  would  be  difficult 
to  extend  Theorem  1 . 
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TCH:  A  NEW  FAMILY  OF  CYCLIC  CODES  LENGTH  2™ 


( t )  F.  A.  B.  CERCAS  *  ( t )  M,  TOMLINSON  ( t )  A.  A.  ALBUQUERQUE 
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SUMMARY 


A  new  class  of  block  codes  length  2”*,  named  TCH  (Tomlinson, 
Cercas  and  Hughes),  has  been  found  based  on  finite  field  theory.  So¬ 
phisticated  computer  techniques  have  been  used  to  refine  and  extend 
these  codes  to  other  binary  number  lengths  not  directly  achievable, 
as  the  number  of  code  lengths  lying  in  the  range  of  most  practical 
applications  is  extremely  scarce. 

TCH  codes  have  the  advantage  of  an  easy  implementation  of  the 
receiver  and  are  suitable  for  a  wide  range  of  applications  in  commu¬ 
nications  particularly  those  taking  place  in  adverse  environments  like 
fading,  Doppler  effects,  reflections  and  all  types  of  interference.  For 
this  reason  we  looked  for  codes  which  could  be  at  least  cyclic  and 
with  code  length  n  =  2”*,  m  being  a  positive  integer.  This  allows 
the  implementation  of  a  maximum-likelihood  decoder  with  a  bank  of 
correlators  using  transform  techniques,  such  as  the  Fast  Fourier  Trans¬ 
form  (FFT),  while  keeping  the  total  number  of  correlators  as  low  as 
possible. 

TCH  codes  are  nonlinear  cyclic  codes  of  length  n  =  2™  as  the 
linear  addition  modulo  2  of  two  codewords  does  not  necessarily  produce 
another  valid  codeword.  They  are  cyclic  in  the  sense  that  every  cyclic 
shift  of  any  codeword  is  always  a  valid  codeword.  A  TCH  code  is 
then  a  block  code  closed  under  cyclic  shifting  but  with  the  all-zero 
codeword  excluded.  TCH  codes  can  be  defined  in  terms  of  h  code 
polynomials,  Pi(*)i=i  to  h,  where  ?;(*)  ^  mod  n.  i  j,  for 

all  time  shifts  r.  TCH  codes  are  also  non-systematic  and  cannot  be 
defined  in  terms  of  a  set  of  parity  check  equations,  except  in  special 
cases.  The  number  of  information  bits  k  of  a  TCH(n,k,t)  code,  able 
to  correct  t  errors,  or  simply  TCH(n,k),  is  given  by  ; 

A:  =  m -f -H  (1) 

where  the  term  1  accounts  for  including  the  inverses  of  all  codewords, 
which  are  also  valid  codewords. 

TCH  codes  can  be  generated  in  the  following  way  ;  once  we  want 
cyclic  codes  length  n  =  2’",  we  must  find  polynomials  P(z)  length 
n  with  coefficients  a*,  »  =  0, l,...,n  -  1  from  GF(2).  Finite  field 
theory  tells  us  that  polynomials  of  degree  n  with  coefficients  from  a 
Galois  field  GF(q),  where  q  is  related  to  a  prime  number  p  by  q  =  p*, 
k  a  positive  integer,  can  be  the  field  elements  of  GF(q).  Restricting 
the  coefficients  to  GF(2),  as  required,  and  for  A  =  1,  we  can  easily 
construct  basic  TCH  polynomials  for  all  prime  numbers  p  verifying  the 
following  equation  : 

p  =  n  -I- 1  =  2’"  -FI  (2) 

Basic  TCH  polynomials  have  the  form  : 

f’i(r)=  E  (3) 

■  sO 

where  the  number  of  terms  is  n/2  so  the  number  of  ones  equals  the 
number  of  zeros  in  the  polynomial.  The  exponent  values  Ki  are  those 
which  verify  the  equation 

o'f*  =  1 -I- Q«+*  i=0,l . ^^-1  (4) 


for  any  given  primitive  root  a  of  GF(q).  Each  of  the  existing  a  gen¬ 
erates  a  different  P|(z),  containing  2n  codewords,  which  is  the  basis, 
A  =  1,  for  a  new  TCH  code. 

A  good  method  to  expand  the  codeword  set  is  to  use  the  first 
polynomial  in  a  shift  and  add  procedure  which  consists  of  cyclically 
shifting  P](z)  and  adding  the  shifted  polynomial  with  the  original 
one,  in  order  to  get  a  second  polynomial  p2(z).  If  P](z)  has  a  good 
auto-correlation  function  and  a  good  cross-correlation  function  with 
Pi(x),  for  all  time  shifts,  then  Pi(z)  is  included  in  the  codeword 
set,  A  =  2,  therefore  doubling  the  total  number  of  codewords.  The 
procedure  is  continued  in  the  same  way  in  order  to  increase  the  number 
of  information  bits  A  for  a  given  value  of  minimum  distance  d^in 
required.  The  good  results  obtained  with  this  method  rely  on  the 
structure  of  TCH  codes  itself.  The  cross-correlation  coefficient 
between  Pi(z)  and  Pi(x)  =  Pi(z)[l  -I-  z'].  where  r  is  a  time  shift,  is 
given  by  ; 

C,  =  n-2VF[P,(i)(l-|-x"-|-x'+>)]  (5) 

where  IF|.]  is  the  Hamming  weight.  Cj  is  virtually  zero  for  j  =  —r. 
For  other  time  shifts  the  coefficients  are  evaluated  and  tested. 

Although  the  ratio  A/n  of  TCH  codes  found  so  far  is  relatively 
small,  it  is  shown(l,  2]  that  low-rate  coding  using  TCH  codes  can 
have  significant  advantages.  The  fact  that  TCH  codes  have  length 
2”  and  not  2"  -  1  like  BCH  codes,  dramatically  reduces  the  total 
number  of  transforms  through  a  decoding  process  with  just  two  steps  : 
in  the  first  step  the  modulus  of  A  transforms  is  evaluated,  choosing 
the  code  polynomial  which  best  matches,  and  in  the  second  step  the 
computation  of  its  2”*  phases  is  performed  so  to  decide  what  was  the 
most  likely  codeword  sent.  The  speed  gain  S,  defined  as  the  ratio 
between  the  total  number  of  operations  needed  to  perform  maximum 
likelihood  decoding,  and  the  number  of  operations  needed  by  a  TCH 
decoder  is  given  by  : 

_  A2’"  A  >.2* 

*  ~  A  -I-  2’"  “  1  -F  A  -  2n*  2*  '  ^ 

For  the  TCH(512,16,111)  codes  found  so  far  the  decoding  process 
can  be  speeded  up  by  more  than  100  times  (5,  =  102.4),  and  just  a 
bit  less  (5,  =  83.3)  for  the  TCH(256,16,54)  codes(2]. 

The  use  of  TCH  polynomial  codewords  as  PN  sequences  can  also  be 
advantageous  as  there  is  very  little  spectral  overlay  between  them(2], 
or  in  other  words,  low  cross-correlation.  It  can  be  shown  that  the 
average  signal-to-interference  ratio  for  the  mentioned  TCH(256,16) 
code  is  24.3  dB,  which  is  identical  to  the  best  code  of  this  approximate 
length,  the  (255,16)  Kasami  code. 
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Abstract 

We  examine  the  security  of  several  digital  signature  schemes 
based  on  algebraic  block  codes.  It  is  shown  that  Xinmei’s  digi¬ 
tal  signature  scheme  can  be  totally  broken  by  a  known  plaintext 
attack  with  complexity  0(**),  where  k  is  the  dimension  of  the 
code  used  in  the  scheme.  Barn  ud  Wang  have  proposed  a  mod¬ 
ified  version  of  Xinme.  .  scheme  that  prevents  selective  forgeries. 

Their  scheme  is  also  shown  to  be  vulnerable  to  a  known  plaintext 
attack.  We  then  present  a  new  signature  scheme  that  we  believe 
to  be  resistant  to  the  previously  described  attacks. 

1  Xinmei’s  Digital  Signature  Scheme 

Xinmei’s  digital  signature  scheme  [1]  attempts  to  base  its  security  on 
the  intractability  of  the  general  decoding  problem  and  the  difficulty 
of  factoring  large  matrices.  Each  user,  say  user  A,  chooses  an  {n,fc) 
binary  Goppa  code  C»  that  has  the  ability  to  correct  t a  errors.  A  fe  x  n 
binary  generator  matrix  Ga  and  an  (n  —  t)  x  i»  binary  parity  check 
matrix  Hj^  are  selected  for  Ca-  User  A  then  finds  the  n  x  k  binary 
matrix  GJ  such  that  =  h,  where  /*  is  the  fc  x  fc  identity  matrix. 

User  A  selects  a  nonsingular  binary  n  X  n  matrix  and  a  nonsinsular 
binary  k  x  k  matrix  5a.  User  A  completes  the  set-up  of  the  system 
by  constructing  the  matrices  Ja  = 

Ta  =  PZ'HJ 

The  public  key  consists  of  Ja,  Wa,  Ta,  Ha,  tA,  and  t',  where 
t'  is  an  integer  such  that  t'  <  Ia-  The  private  key  consists  of  the  two 
matrices  SaGa  and  Pa- 

User  A  obtains  the  n-bit  signature  £j  of  the  fc-bit  message  mj 
by  computing  Cj  =  (cy  @ijJj5AGA)fAi  where  £y  is  an  n-bit  error  vector 
with  Hamming  weight  ujh(£j)  =  t'  chosen  at  random  by  user  A. 

The  receiver  validates  the  possibily  noise  corrupted  signature 
Cy  through  the  use  of  the  Berlekamp- Massey  algorithm  and  the  public 
key  [1]. 

2  Cryptanalysis  and  a  Modification  of  Xin¬ 
mei’s  Scheme 

In  [2]  the  authors  showed  that  the  linearity  of  the  code  and  knowledge 
of  the  error  vectors  could  be  exploited  in  a  chosen- plaintext  attack  that 
results  in  a  complete  break  of  Xinmei’s  scheme.  The  attack  transforms 
the  cryptanalytic  problem  into  a  pair  of  systems  of  linear  equations, 
one  containing  n  equations  in  n  variables,  and  the  other  containing  k 
equations  in  k  variables.  The  complexity  of  the  attack  is  thus  0(n®). 

It  was  also  observed  by  Harn  and  Wang  in  [3]  that  the  com¬ 
bination  of  valid  signatures  of  some  messages  yields  a  valid  signature 
for  another  message.  Harn  and  Wang  [3]  proposed  a  modification  of 
Xinmei’s  scheme  that  appears  to  secure  it  against  selective  forgery. 
Their  scheme  requires  that  user  A  publish  the  same  public  keys  as  in 
Xinmei’s  scheme,  with  the  further  restriction  that  Pa  b-.  a  permuta^ 
tion  matrix.  In  addition,  they  introduced  a  one-way  hashing  function 
h  that  is  made  public.  The  hashing  function  accepts  an  1-bit  vector 
and  produces  a  k-bit  vector,  where  I  >  k,  thus  implementing  a  form  of 
compression. 

The  n-bit  signature  £y  of  the  1-bit  message  ia,  •>  obtained  by 
computing  £y  =  /t(!ay)5AG*/’A.  When  the  signature  £,  is  transmit¬ 
ted,  it  becomes  susceptible  to  errors  induced  by  additive  channel  noise 
£y.  The  received  signature  is  thus  denoted  by  £^  where  =  £,  © 
h{i!ij)SAGAPA-  The  signature  is  verified  in  a  manner  similar  to  that 
in  Xinmei’s  original  scheme. 


In  [4]  it  is  shown  that  Harn  and  Wang’s  scheme  is  suscep¬ 
tible  to  a  known-plaintext  attack.  Since  the  error  vectors  are  re¬ 
vealed  daring  the  verification  process,  we  can  obtain  the  expression 
^  ©£y  =  k(iHj)SAGAPA-  U«t  HinHhi-- -.21*  he  k  distinct  messages, 
k{sii ),  Hah),  Hsik )  their  respective  images  under  the  function  h, 
and  £),£], -...£a  fhe  corresponding  signatures.  A  linear  system  of 
equations  is  then  created;  [gt  ®  Cy]  =  [lt(ia>)]5AGA  Ja  • 

If  the  vectors  [/i(aij  )]  are  linearly  independent,  SaGaPa  can  be 
obtained  in  0(kp)  operations. 

3  A  New  Digital  Signature  Scheme 

A  system  is  proposed  that  uses  a  series  of  intentional  error  vectors 
that  are  in  the  same  coset  as  the  maximum  likelihood  error  pattern, 
but  have  higher  weight.  These  error  vectors  thus  cannot  be  obtained 
through  standard  decoding  techniques,  making  the  proposed  system 
immune  to  the  above  attacks. 

A  function  /(x,  y)  is  made  available  to  all  users.  /  is  a  nonlinear 
invertible  function  where  i  is  a  binary  k-tuple,  y  is  a  binary  n-tuple, 
and  the  output  value  is  a  binary  k-tuple. 

Each  user,  say  User  A,  selects  an  (n,  k)  binary  irreducible  Goppa 
code  Ca  that  has  the  ability  to  correct  some  Ia  errors.  User  A  then 
selects  a  generator  matrix  Ga  and  a  parity  check  matrix  Ha  ,  and  finds 
an  n  X  k  binary  matrix  G'a  such  that  GaG\  =  /*.  A  nonsingular 
binary  n  X  n  matrix  Pa  is  generated  and  the  matrices  G*  =  J^’CJ 
and  H'a  =  PZ^  HI  computed. 

Finally  User  A  selects  an  n  x  I  binary  matrix  Wa  of  rank  n, 
where  n  <  1,  and  determines  W*  such  that  HaW*  =  Jn- 

The  public  key  consis’s  of  G'^,  H'a,  W‘,  Ia,  and  I*,  where  is 
an  integer  such  that  t'  <  t a-  The  private  key  consists  of  the  matrices 
Ca,  J\.C;,and  Wa. 

A  k-bit  message  my  is  signed  in  the  following  manner.  A  random 
binary  error  vector  of  length  n  and  weight  Ia  is  selected.  A  random 
l-bit  vector  of  arbitrary  weight  is  also  selected.  The  f-bit  signature 
ij  is  then  computed  using  the  following  expression. 

i,  =  S.,G’a\Ga)Pa  ®tjWl}WA  ©e,.  (1) 

The  signature  is  validated  by  first  computing  Vj  =  The 

Berlekamp- Massey  algorithm  is  then  applied  to  yy  to  obtain  an  esti- 
mateofx^.  The  remainder  of  the  public  key  is  used  to  obtain 
which  is  then  compared  to  the  value  computed  by  the  receiving  user. 
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Abstract:  It  is  known  that  equivalent  linear  block  codes  may  have 
different  minimal  trellis  structures.  The  minimum  complexity  among 
all  minimal  trellis  structures  of  equivalent  codes  is  defined  as  the  trel¬ 
lis  complexity  of  the  class  of  equivalent  codes.  Sharper  lower  bounds 
for  trellis  complexity  are  derived  when  more  information  about  the 
infrastructure  of  codes  is  supplied.  These  bounds  serve  as  a  starting 
specification  for  a  search  algorithm  to  find  optimal  permutations  un¬ 
der  which  the  permuted  codes  achieve  the  trellis  complexity.  A  simple 
application  to  the  class  of  equivalent  binary  [17,9]  quadratic  residue 
codes  finds  the  trellis  complexity  is  five. 

Let  C  be  an  [n,ib,d]  linear  block  code  over  GF(g).  Let  O  be 
its  dual  code  with  minimum  distance  d^.  Let  S„  be  the  set  of 
all  permutations  on  the  n  coordinates  of  codewords.  Let  a{C)  be 
the  equivsdent  code  of  C  under  a  permutation  a  in  5„.  Let 

be  the  past  (future)  subcode  of  (t(C)  which  consists  of 
codewords  whose  future  (past)  coordinates  to  position  i  are  all 
zero.  Let  kp_i{(r)  (kf_,(<7))  be  the  dimension  of  the  past  (future) 
subcode  ar(C)|, ,  (<t(C)^,).  The  dimension  A:,,, (<t)  of  the  state  space 
at  position  i  in  a  minimal  trellis  of  <t{C)  is  [1] 

k..ii<^)  =  k-  kf,i{a)  -  kj,i{(T). 

Let  s(<7(C))  be  the  maximum  value  of  Ar,.,  over  0  <  i  <  n.  The 
trellis  complexity  s  of  the  class  of  equivalent  codes  of  C  is  defined 
as 

s  =  min  s(<r(C)). 

Let 

=  max  k^^i{a),K/,i  =  max  A:/,. (<7),  K,_,  -  k  -  A'p,.  -  A'/,,. 

tf€5r» 

Note  that  A’p,;  =  Since  kp_i(<T)  <  I\p,,  and  A-/,, (<7)  <  A'/.i, 

we  have 

k.A<^)  >  K.,. 

In  general.  A',,,  and  A/,,  are  intrinsic  attributes  of  the  class  of 
equivalent  codes  of  code  C.  A/,,  may  be  estimated  by  N{a,0) 
[2]  which  is  the  minimum  possible  block  length  for  a  linear  block 
code  to  have  minimum  distance  a  and  dimension  0  as  follows: 


2.  k  —  ip^i  —  kj^i  >  0  for  all  0  <  i  <  n. 

The  minimal  trellis  structure  of  an  equivalent  code  <j(C)  is 
said  to  be  dominated  by  a  specification  as  in  above  if  all  its 
past  dimensions  kpA’^)  future  dimensions  kjA<^)  are  upper 
bounded  by  and  k/j  respectively  at  each  position  i.  Neces¬ 
sary  and  sufficient  conditions  for  the  existence  of  a  permutation 
a  under  which  the  minimal  trellis  structure  of  the  permuted  code 
is  close  to  and  dominated  by  a  specification  are  developed.  And 
a  constructive  algorithm  is  then  built  to  search  for  optimal  f>er- 
inutations  under  which  permuted  codes  can  cichieve  the  trellis 
complexity. 

Let  C  be  the  binary  [17,9]  quadratic  residue  code  generated 
by  <;(x)  =  1  -i- 1®  -I- -i- 1®  -t- 1*..  Let  I>  be  its  dual  code.  The 
minimum  distance  of  C  and  t>  are  d  =  5  and  =  6.  By  applying 
the  above  results,  we  can  list  /(■/,,,  A'p.,-,  and  A',.,  in  the  following 
table: 


1 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

Kp,i 

0 

0 

0 

0 

0 

I 

1 

1 

2 

3 

4 

4 

5 

6 

7 

8 

9 

Kf.i 

9 

8 

7 

6 

5 

4 

4 

3 

2 

2 

1 

1 

1 

u 

0 

0 

0 

0 

K..i 

0 

1 

2 

3 

4 

4 

4 

5 

5 

5 

5 

4 

4 

4 

3 

2 

1 

0 

Hence,  the  trellis  complexity  of  the  class  of  equivalent  binary 
[17,9]  QR  codes  is  not  smaller  than  5.  To  find  optimal  permuta¬ 
tions  and  then  to  determine  the  exact  trellis  complexity,  we  start 
our  search  algorithm  with  the  following  specification  of  future 
and  past  dimensions  kp^i,  kj^i,  a  very  slight  variation  from  the 
above  table,  listed  in  the  next  table: 


% 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

0 

0 

0 

0 

0 

0 

1 

1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

9 

‘fi- 

9 

8 

7 

6 

5 

4 

3 

3 

2 

2 

1 

1 

0 

0 

0 

0 

0 

0 

t.,. 

0 

1 

2 

3 

4 

5 

5 

5 

5 

5 

5 

5 

5 

4 

3 

2 

1 

0 

With  the  above  specification,  we  have  constructed  four  optimal 
permutations: 


(7  =  (1,4,5,6,9,7,10,2,14,17,3,15,8,11,12,13,16) 
or  (1,4, 5, 6, 9, 10,7,2, 14, 17,3, 15,8, 11, 12, 13, 16) 
or  (1,4, 5, 6, 9, 7, 10,2, 14, 17, 15,3,8, 11, 12, 13, 16) 
or  (1,4, 5, 6,9, 10,7,2, 14, 17, 15,3,8, 11, 12, 13, 16) . 


1.  If  i  <  N{d-^,j)  ~  li  Iben  A'/,,  <k  —  i  +  j  —  \. 

2.  If  i  >  n  —  N{d,j)  +  1,  then  A/.,  <  j  —  1. 

More  precisely,  for  binary  codes  and  early  and  late  positions  i, 
A can  be  evaluated  as  follows: 

Ik  —  i,  if  0  <  i  <</•*■  —  1, 

jfc-H-l,  ifd^<i<d^-i-[!^]-l, 

1,  if  n  —  (d  -I-  [jj  —  1)  <  i  <  n  —  d, 

0,  if  n  —  d  -(-  1  <  j  <  n. 

Two  monotone  sequences  0  =  fcp.o  <  Ap.i  <  ■ .  •  <  kp_„  =  k 
and  k  =  k/fl  >  jt/,i  >  ...  >  i'/.n  =  0  together  are  called  a 
specification  of  past  and  future  dimensions  if  they  satisfy 

1.  0  <  hp,i  -  kp,,.Ah.-i  -  h>)  <  l.Vl  <  •  <  n; 


Wit’,  anyone  of  the  above  permutations,  we  have 


i 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

17 

*...(«■) 

0 

0 

0 

0 

0 

1 

1 

1 

2 

2 

3 

3 

4 

5 

6 

7 

8 

9 

9 

8 

7 

6 

5 

4 

3 

3 

2 

2 

1 

1 

1 

0 

0 

0 

0 

0 

0 

1 

2 

3 

4 

4 

5 

5 

5 

5 

5 

5 

4 

4 

3 

2 

1 

0 

Thus,  the  trellis  complexity  of  the  class  of  equivalent  binary  [17,9] 
QR  codes  is  5. 
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CHANNEL  EQUALIZATION  FOR  BLOCK  TRANSMISSION  SYSTEMS 
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In  Block  Transmission  System  the  information  symbols 
are  arranged  in  the  form  of  blocks  alternating  with  blocks 
of  L  known  symbols,  L  being  the  length  of  channel 
memory*.  The  latters  help  to  identify  the  channel  and  to 
process  each  information  block  independently  from  the 
others;  their  interferences  on  the  sampled  output  of  the 
matched  filter  are  calculated  and  then  subtracted 
We  thus  obtain  the  observation  vector  Y  =  R  D  +  U, 
where  R  is  an  MxM  Toeplitz  Hermitian  symmetric 
matrix  which  represents  channel  distorsion,  D  =  (dM  , 

dM-\ . di)'  is  the  information  symbol  vector  and  U  is  a 

Gaussian  zero-mean  noise  vector  whose  covariance 
matrix  is  2No  R.  The  known  receiver  for  this  system  is 
the  Nonlinear  Data-Oirected  Estimator  (NODE)*. 
Its  complexity  is  it  has  to  solve,  using  the 

Levinson  Algorithm,  Af/2  Toeplitz  systems  of  decreasing 
order. 

In  this  paper  we  extend  the  equalization  techniques  to 
Block  Transmission  Systems  and  deduce  decision- 
feedback  equalizers  that  have  better  performance  and  less 
complexity  than  the  NODE.  The  fact  that  the  observation 
and  its  noise  are  snapshots  from  stationary  time-series 
suggests  the  use  of  a  nonstationary  innovations 
repre.sentation  based  on  Cholesky  factorization  of  the 
matrix  R  as  R  =  H*i  £2  h,  where  H  is  an  upper  triangular 
matrix  with  I's  along  the  main  diagonal  and  £  is  diagonal 
with  positive  real  entries  £ii=  cr, ;  i=0,  I,  ...Af-1.  The 
factors  H  and  £  are  obtained  from  the  Schur  algorithm. 
We  use  this  representation  to  deduce  the  following 
processors. 

1.  Noise  Whitener  (£  'H*>-').  It  transforms  the 
observation  Y  into  Z  =  £  H  D  +  V.  The  covariance  matrix 
of  the  noise  V  is  2No  L  The  vector  Z  is  obtained  from  Y 
without  inverting  H*‘,  thanks  to  its  triangular  structure. 

2.  Maximum  Likelihood  Block  Detector.  It  may  use 
the  Viterbi  algorithm.  The  main  difference  with  the 
conventional  maximum  likelihood  sequci.ee  detector  is 
that  the  channel  seen  by  the  detector  is  time-varying. 

3.  Zero-Forcing  Block  Linear  Equalizer  (ZF-BLE) 
It  gives  X  =  (£  H)  >Z  =  R  '  Y=  D  +  W.  also  without 
inverting  H.  Suboptimum  symbol-by-symbol  decisions  are 
obtained  from  X  via  a  threshold  detector.  The  signal-to- 
noise  ratio  in  the  decision  variable  on  symbol  dj,  is  SNRj 
=  £,  /(Noao^  (R  'lM-iju-i ).  where  £,  is  the  symbol  energy. 

4.  Zero-Forcing  Block  Decision-Feedback 
Equalizer  (ZF-BDFE).  We  obtain  from  the  noise 
whitener  £  '  Z  =  H  D  +  £-'  V.  The  transformation  H  is 
causal  and  triangular.  Thus,  starting  with  a  decision  on  dj. 
the  decision  on  symbol  di  can  be  obtained  with  the  help  of 
decisions  on  previous  symbols,  as  made  in  the 


conventional  DFE.  We  have  SNR|  =  EJiNoOo^).  We 
show  that  this  performance  is  better  than  that  of  NODE, 
ZF-BLE,  and  the  conventional  ZF-DFE.  The  complexity 
is  0(LM). 

S.  Minimum-Mean-Squared-Error  Block  Linear 
Equalizer  (MMSE-BLE).  The  performance  degradation 
of  the  ZF-BLE  can  be  reduced  by  inserting  between  R  ' 
and  the  threshold  detector  a  Wiener  estimator  ^  to  obtain 
X’  =  ‘P  X  =  D  +  W,  where  the  power  of  every 
components  w',  of  W  is  minimized.  The  cascade  of  R  ' 
and  'P  is  a  transformation  R'  '  =4^  R  '  whose  input  is  Y 
and  output  is  X',  where  R  -  |  R  +  (2No/Eld*12)  1 ).  The 
covariance  of  the  error  W'  is  2No  R"’.  The  SNRi  is 
(Eld, 12/  ElwVM  -1. 


6.  Minimum-Mean-Squared-Error  Block 
Decision-Feedback  Equalizer  (MMSE-BDFE).  The 
observation  can  written  as  Y  =  R  D+  U ,  where 

the  error  U’  has  a  covariance  matrix  2NoR’-  The  matrix 
R’  can  be  Cholesky-factored  as  R’  =  H'*'  £'2  H'.  As  above, 
a  noise  whitener  (£'  >H'*'  >)  can  be  used  and  its  output  is 
processed  using  decision  feedback  strategy  as  in  ZF- 
BDFE.  The  decision  vector  is  S  «  D  -t-  A.  where  the  error 
A  is  a  mixture  of  noise  and  residual  intersymbol 
interference.  Its  covariance  is  The  SNRi  is 

lEld,|2/  £15,12)  -1,  where  5,  is  component  of  A.  This 
performance  is  better  than  the  that  of  NDDE,  ZF-BDFE, 
and  the  conventional  MMSE-DFE.  The  complexity  is 
0(LM). 

Conclusions:  Whereas  conventional  equalizers  use 
transversal  filters,  the  derived  ones  involve  the  use  of 
matrix  transformations,  as  expected.  These 
transformations  can  be  implemented  exactly,  while  what 
is  deduced  from  the  theory  of  conventional  equalization  is 
approximated  by  simple  implemenuble  filters.  Moreover, 
assuming  the  channel  impulse  response  (or  its  estimate)  is 
available,  the  equalizer  coefficients  (the  matrix  entries) 
arc  easily  calculated  using  the  Levinson  or  Schur 
algorithms.  We  use  the  latter  because  it  implies 
complexity  reduction  and  allows  us  to  use  a  decision- 
feedback  strategy.  We  have  evaluated  the  performances  of 
the  deduced  equalizers  and  compared  them  with  that  of 
known  ones.  The  ZF  and  MMSE  block  decision  feedback 
equalizers  are  particularly  attractive  because  of  their 
better  performance  and  lower  complexity  as  compared 
with  the  known  NDDE. 


*  F.  Hsu,  "Data  Directed  Estimation  Techniques  for 
Single-Tone  HF  Modems,"  IEEE  Military  Common. 
Conf..  Boston,  MA.,  Oct.  1985. 
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Summary 

We  consider  a  communication  channel  that  is  cotrupted  by  both 
finite  ISI  and  additive  white  Gaussian  noise.  The  impulse  response  of  tJie 
channel  is  h  (r)  and  we  assume  BPSK  modulation.  We  assume  that  A(r)  is 
time-limited  to  nT.  where  l/T  is  Ok  rate  at  which  data  is  transmitted  on 
the  channel.  Hence  die  output  of  die  chatuicl  is  given  by 

r(r)  =  2:(2a*-lVt(r-*r)-f«(r).  (1) 

k 


where  n(t)  is  additive  white  Gaussian  noise  with  two  sided  power  spectral 
density  NqII  attd  at  is  equally  likely  to  be  0  or  I.  The  receiver  filter  is 
g(r).  die  whitened  matched  filter  corresponding  to  Aft).  The  noise 
samples  at  die  output  of  the  sampler  are  uncorrelated  Gaussian  tarKlom 
variables  with  zero  mean  and  variattce  equal  to  NqII.  If  we  denote  die 
sampled  outputs  of  g  (t)  by  (yt  1  •  then 

y»  =  ‘'t-t-nt . 

where 

vt  =  £  /.(2at.i-l) 

!•<) 

and 


No  , 

In  (2).  the  symbol  £|  |  starvls  for  expectation  and 

fl  if  A 


5*-  = 


0  else 


(2) 


The  set  of  constants  |/t  |  depends  on  die  pulse  autocorrelation  function  of 
dK  impulse  response 

Anodier  way  to  represent  the  same  ISI  chanttel  is  as  a  trellis  code.  It 
is  equivalent  to  a  rate  l/n  binary  input,  linear  convolutional  code, 
followed  by  a  nonlinear  mapping.  Each  input  to  die  channel  a*  results  in 
an  n-bit  codeword  at  die  output  of  the  encoder  given  by 

C*  =  (aj,  Oj-i .  04-2.  '.at.a,|)  (3) 

The  nonlinear  mapping  in  our  case  (i.e.,  for  the  ISI  channel)  is  given  by 

«  -I 

W(C*)=  2;/.(2aj..-l)  (4) 

Tbereforc.  using  this  notation* 

>*  =  W(C*)-*-/ij  (5) 

Consider  die  following  definitions: 

Definition  1:  The  polynominal,  /(D),  that  characterizes  die  ISI  chaniKl  is 
given  by 

/(0)=I/.0'  (6) 

1*0 


Definition  2:  Let  E*  be  a  binary  n-tuple  given  by  (ej.  e*.  .  ei-i). 

Then  the  squared  Euclidean  error  weight  of  C*  with  respea  to  E*  is  given 
by 

</*{C;E)=  ||W(C)-M(C®E)||^  (7) 

where  the  symbol  ffi  staixJs  for  bit-by-bit  modulo-2  (logical  XOR) 
operation  of  two  vectors  of  length  n.  Since  diere  are  at  most  2"  channel 
signals,  there  can  be  at  most  2"''(2"-l)+l  possible  values  for  the 
Euclidean  weight. 

Definition  3.  Let  A  be  a  set  of  binary  n-tuple  vectors  C,  (or. 


cquivalendy,  a  set  of  channel  signals).  The  weight  profile  of  the  set  A  with 
respect  to  a  given  error  vector  E*.  denoted  F(A,  E*.1V'),  is  given  by 

F(A.  E*.  W)  =  2;m„  W*  .  (8) 


where  nia  is  die  number  of  elements  in  die  set  A  that  have  a  squared 
Euclidean  error  weight  a  with  respect  to  E*. 


It  is  now  straightforward  to  prove  the  following  lemma: 


Lemma:  Let  A  be  the  set  of  all  binary  n-tuples.  Then  die  subset  A^  of  A 
defined  by 


A.= 


C  I  C  =  (iz . jt  0) 

^•-1  ubttniy 
bmaiy  numocn 


forms  a  sub-group  of  A,  under  die  operation  ®  defined  in  (7). 

The  set  A^  has  cardinality  2"‘'.  The  only  coset  of  A^.  denoted  A^. 
may  be  formed  from  A^  as 

A,  =  |C'  I  C'  =  ®(1  1  1  I).  where Ce  A,:) 

knftlh*  vcoor 
dTiiranei 


This  follows  from  the  fact  diat  the  all-one  vector  is  not  in  A^.  Hence 
adding  this  vector  to  all  elements  of  A^  forms  die  only  coset  of  A^.  A^  also 
has  cardinality  2* 

Theorem:  For  any  error  pattern  E  of  length  n  and  for  any  partial 
response  channel  /  (D)  with  (n  - 1)  interfering  symbols, 

f(A,.  E.  W)=^F(A,.EW) 


The  proof  is  ptcsenied  in  1 1 1. 


For  die  ISI  channel  with  n  =  2.  the  su^t  A,  corresponds  to  all 
possible  outputs  from  the  all  zero  state  and  A,  is  the  only  coset  of  A^. 
Hence  this  channel  satisfies  the  conditions  imposed  in  |2|  for  a  cla.ss  of 
trellis  codes,  namely  a)  die  trellis  is  based  upon  a  binary,  linear 
convolutional  code  of  rate  (n  - 1 )/  n  with  a  nonlinear  mapping  from  die 
encoder  output^  to  channel  input  symbols.  and  b) 
F(Aj.E,  W)!=F(Ac,  E.  W).  Thus,  we  can  apply  a  modified  generating 
function  of  die  ISI  chaimel  with  oik  interfering  symbol  dial  involves  only 
2  slates  in  dK  state  diagram.  This  modified  gciKiaiing  funaion  can  then 
be  used  to  compute  dK  probabilities  of  both  event  errors  and  bit  errors.  In 
tlK  computation  of  dK  generating  function,  we  may  as.sumc  that  dK  initial 
state  is  dK  all  zero  state,  widxiui  loss  of  any  generality.  The  edge  labels 
are.  however  LF(A^.  E.  IV)/'.  where  r  is  dK  number  of  data  bit  errors.  If 
we  denote  dK  resulting  generating  function  T{W.  L.  /).  dien  the 
probability  of  event  error  P,  and  dK  probability  of  bit  error  F*  are  upper- 
bounded  by 


r(iv  =  exp 


4No 


=  1). 


and 


d/ 


(W  =exp 


2 

2 


./  =  !). 


respectively,  where  dj  is  the  normalized  minimum  squared  free  Euclidean 
distance  of  dK  ISI  channel  and  £,  is  the  average  channel  symbol  energy 


References 

1 1 1  S.  A.  Raghavan.  J.  K.  Wolf,  and  L.  B.  Milstein.  "On  Uk  Pcrfonnance 
Evaluation  of  ISI  CTianiKl".  Accepted  in  IEEE  Tran.',  on  lirform 
Theory. 

|2)  E.  Zchavi  and  J.  K.  Wolf.  On  dK  Rerfoimance  Emulation  of  Trellis 
Codes,"  IEEE  Trans  on  Inform.  Theory  IT-33,  196-202.  March 
1987, 


wofli  ptniaOy  tupponad  bf  the  NmioirI  Scicnoe  FoundMion  under  gnm  NC!K  9105639, 
«td  the  Cenicr  for  M«fnetK  Recorda)|  Retarcli  m  ihc  Cnivcnity  of  CtlifomM.  San  Diefo 


202 


Performance  of  M-Algorithm  Receivers  With  Imperfect  Channel  Estimates 


F.  Gozzo 

IBM  -  Federal  Systems  Company 
Owego,  NY  13827-1298 


J.B.  Anderson 

Rensselaer  Polytechnic  Institute 
Troy,  NY  12180-3590 


1.  Introduction 

In  any  practical  system,  the  channel  estimate  can  be 
inaccurate  for  one  of  many  reasons  incliuiing  finite-length 
training  sequences,  quantization,  time-varying  channels,  anti 
truncation.  Thus,  the  performance  of  any  receiver  will 
typically  degrade  under  mismatched  channel  conditions. 
Although  this  degradation  in  performance  was  extensively 
studied  for  MLSE  receivers  by  Divsalar  [1],  the  optimality  of 
the  mismatched  MLSE  receiver  was  not  addressed. 
Unfortunately,  deriving  the  optimal  receiver  under  arbitrary 
mismatched  channel  conditions  is  clearly  an  intractable 
problem.  This  paper  presents  test  results  in  an  effort  to  better 
understand  the  performance  of  MLSE  receivers  in  arbitrary 
channel  mismatch  conditions. 

II.  The  M-Algori(hm 

The  Af-algorithm,  which  has  become  increasingly  popular  in 
communications  applications  [2]-  [7],  gives  the  designer  the 
ability  to  easily  trade-off  complexity  and  performance.  It 
performs  a  breadth-first  search  of  an  ISI  tree  (or  trellis),  but 
only  keeps  the  best  M  paths,  up  to  a  decision-depth,  D.  As 
each  new  signal  is  received,  the  algorithm  extends  the  M 
paths,  sorts  them  according  to  their  cost,  and  retains  only  the 
best  M  paths.  We  will  denote  the  A/-algorithm  receiver  as 
MA(M,D),  where  M  is  the  number  of  paths  to  keep  and  D  is 
the  decision  depth.  Assuming  the  decision  depth  is  long 
enough,  the  single  parameter.  A/,  enables  one  to  investigate  a 
continuum  of  practical  MLSE-based  receivers  —  from 
reduced-search  to  full-complexity  MLSE. 

III.  Test  Results 

The  results  from  [8]  shown  in  Fieure  I  are  for  a  channel 
whose  transfer  function  is  H(z)  =  +  az  A  scries  of 

tests  were  run  to  determine  what  value  of  M  would  yield 
MLSE  performance.  We  began  with  a  complexity  of  M  =  I 
and  proceeded  to  increase  M  until  marginal  performance 
changes  were  found  under  a  broad  range  of  mismatch 
conditions.  Bit  error  rates  arc  shown  as  a  function  of  the 
value  of  the  parameter  a  during  training.  The  highest  value 
of  M  shown  represents  the  saturating  value  of  Af.  i.c., 
increasing  M  further  did  not  significantly  alter  the  curve. 
Based  on  the  test  results,  several  observations  can  be  matlc: 

1.  M-Alftorithm  quickly  converges  to  MLSE  Performance. 
Relatively  few  paths  were  required  to  achiese  ncar-MLSE 
performance,  in  spite  of  the  fact  that  merged  paths  were 
ignored  by  our  implementation. 

2.  Increasing  M- Algorithm  Complexity  May  Degrade 
Performance  In  Mismatch  Conditions.  Analogous  to  results 
for  other  receivers  discus.scd  in  [8],  we  feel  confident  that: 

Under  channel  mismatch  conditions,  a  (well-trained) 
full-complexity  MLSE  receiver  will  he  optimally  matched  to  the 
wrong  channel,  and  may  therefore  achieve  a  deeper  level  of 
mismatch  than  a  reduced-complexity  MLSE  scheme. 


3.  Eigenvalue  Spread  Is  Useful  In  The  Analysis  of  Channel 
Mismatch.  In  Figure  1,  we  have  superimposed  the  eigenvalue 
spread  of  Channel  C.  As  can  be  seen,  the  point  at  which  the 
MLSE  receiver  (i.c.,  MA(5,10))  is  no  longer  optimum 
coincides  with  the  point  at  which  the  eigenvalue  spread 
approaches  infinity.  In  general,  this  complexity-inversion 
phenomenon  —  the  situation  when  increasing  a  receiver's 
complexity  could  actually  degrade  its  performance  —  was 
found  to  occur  whenever  the  eigenvalue  spread  during 
training  and  decoding  differed  significantly. 


OttWN 

Figure  I.  Performance  of  M-Algoiithm  In  Channel  Mismatch. 

Solid  curves  represent  P*  for  selected  Af.  Dashed  curve 
represents  the  eigenvalue  spread,  x- 

References 

[1]  D.  Divsalar.  'Performance  of  Mismatched  Receivers  on  Handlimilcd 
Channels.'  Ph  D.  Dis.scrtalion.  DCl.A,  I97R. 

[2]  J.B.  Anderson  and  S.  Mohan,  'Sequential  Coding  Algorithms;  A 
Survey  and  Cost  Analysis,'  lEP.IC  Trans.  Commtin ,  Vol.  COM-.t2,  No 
2,  pp  169-176,  Fch.  I9R4, 

[3]  J  B.  Anderson  and  S.  Mohan,  Source  and  Channel  Coding:  An 
Algorilhmir  Approach.  Boston:  Kluwer,  1991. 

[4}  J.B.  Anderson,  "l  imited  Search  Trellis  Dccosling  of  Convolutional 
Codes,'  HUT.  leans.  Inform.  Theory.  Vol  .35,  No  5,  pp.  944-955, 
Sept  1989 

[5]  D.  Bauer,  II  Speer,  and  C.M.  Zhao,  'Performance  of  Sequential 
Detection  Schemes  for  Trcllis-Iincodcd  Data  Signals  Distorted  By 
Intcrsymhol  Interference,'  Pror.  UiE,  Sixth  Ini.  Con/  on  Pig.  Proc  of 
•Sig.  in  Comm  ,  pp.  141-146,  Sept.  1991. 

[6]  F-Q  Wang  and  DJ.  Costello,  "A  Hybrid  M-Algorilhm/.Scqucntial 
Detector  for  Convolutional  and  I'rcllis  Codes,"  SASA  lech  Report, 
CR- 186863,  June  29,  1990 

[7]  O.  /.imrrKiman  and  W.  Rupprccht,  'Adaptive  Receiver  Structures 
With  Sequential  Detection  Algorithms  for  Digit,'il  Mobile  Radio 
Systems,"  Proc.  HiE,  Sixth  Ini.  Conf  on  Pig  Proc  of  Sig  in  Comm . 
Sept  1991,  pp  141-146. 

[8]  p  Ciozzo,  'Robust  -Sequence  P.stimation  in  the  Presence  of  Channel 
Mismatch,'  Ph  D.  Dis,scrtation,  Rcns.sclacr  Poly  Inst  ,  May.  1992 


Presented  at  the  1993  IP.IiP  International  Symposium  on  Information  'Dtcory 
Phis  work  was  supported  by  the  IBM  Resident  Study  Program 


203 


New  Results  in  Signal  Design  for  the  AWGN  Channel 


M.  Steiner 

Naval  Research  Laboratory 


Summary 

There  has  been  a  fair  amount  of  work  done  in  the  area 
of  signal  design.  Unfortunately,  there  are  few  results  on  the 
optimality  of  signal  sets  (throughout  the  paper  an  optimal 
signal  set  is  one  that  maximizes  the  average  probability  of 
detection).  The  optimal  selection  of  M  signal  vectors  em¬ 
bedded  in  even  the  most  fundamental  type  of  noise,  white 
Gaussian  noise,  is  not  known  in  general.  One  of  the  most 
famous  of  communication  conjectures,  dating  back  to  1948, 
states  that  the  optimal  signal  vectors  are  vertices  of  an  n 
dimensional  regular  simplex  for  which  M  =  n  -1-  1.  When 
the  signal  vectors  are  constrained  only  by  an  average  power 
limitation,  this  conjecture  has  been  referred  to  as  the  strong 
simplex  conjecture  (SSC).  To  avoid  confusion,  we  refer  to 
the  conjecture  of  simplex  optimality  when  the  signal  vectors 
lie  on  the  surface  of  a  sphere  as  the  weak  simplex  conjecture 
(WSC).  The  validity  of  the  SSC  implies  the  validity  of  the 
WSC,  although  the  converse  statement  is  not  true. 

Under  the  assumption  that  the  signal  vectors  are  equal 
energy,  Balakrishnan  proved  in  his  seminal  work  [1]  that  the 
regular  simplex  is  1)  optimal  (in  terms  of  maximizing  the 
average  probability  of  detection)  as  the  signal  to  noise  ra¬ 
tio  (SNR)  A  approaches  infinity,  2)  optimal  as  A  approaches 
zero,  and  3)  locally  optimal  at  all  A.  He  also  proved  that  if 
there  does  exist  a  signal  set  which  is  optimal  at  all  A  it  is 
necessarily  the  regular  simplex  signal  set.  Dunbridge(2)[3) 
in  1967  extended  Balakrishnan’s  work  where  only  an  aver¬ 
age  power  constraint  is  imposed  on  the  signal  set.  It  was 
proven  by  both  Balakrishnan  [4]  and  Weber(5]  that  the  reg¬ 
ular  simplex  msiximizes  the  minimum  distance  under  a  peak 
power  constraint. 

A  number  of  new  results  are  presented.  The  strong  sim¬ 
plex  conjecture  is  disproven.  A  signal  set  is  shown  which 
performs  better  than  the  regular  simplex  at  low  signal  to 
noise  ratios  for  all  Af  ^  7.  This  leads  to  a  theorem  which 
states  that  in  general  there  is  no  signal  set  which  is  optimal 
at  all  signal  to  noise  ratios.  Furthermore  it  is  found  that 
the  optimal  solution  at  low  signal  to  noise  ratios  is  not  an 
equal  energy  solution  for  all  '4  >  7.  The  regular  simplex 
is  shown  to  be  the  unique  'ope  which  maximizes  the 


minimum  distance  between  signals.  This  extends  the  re¬ 
sult  by  Balakrishnan  who  proved  that  the  regular  simplex 
maximizes  the  minimum  distance  under  a  peak  power  con¬ 
straint.  This  result  leads  to  the  corollary  that  a  signal  set 
which  maximizes  the  minimum  distance  is  not  necessarily 
optimum.  This  is  an  interesting  result  since  much  signal 
design  work  has  been  based  on  maximizing  the  minimum 
distance  due  to  the  inherent  simplicity  of  the  criteria.  A 
simple  proof  that  the  regular  simplex  maximizes  the  mini¬ 
mum  distance  under  a  peak  power  constraint  is  also  shown. 
The  global  optimality  of  the  regular  simplex  under  the  per¬ 
formance  measure  of  the  union  bound  on  the  probability  of 
detection  is  addressed.  The  uiiion  bound  is  often  used  to 
assess  the  performance  of  many  signal  sets  at  medium  to 
high  SNR  when  computation  of  the  probability  of  detection 
is  intractable.  It  is  proven  that  the  regular  simplex  uniquely 
maximizes  the  union  bound  at  all  signal  to  noise  ratios. 
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Importance  sampling  scheme  using  a  fixed  mean  translation 
(MT)  without  conditioning  on  the  intersymbol  interference  (ISI) 
in  a  digit2d  communication  system  is  proposed.  The  reduction 
in  simulation  samples  is  significant.  MT  shift  found  by  adaptive 
algorithm  agrees  with  the  one  from  numerical  method  based  on 
large  deviation  theory  for  nonlinear  system. 

Introduction 

In  a  digital  communication  system  the  bit  error  probability 
(Pc)  is  an  important  performance  measure,  but  often  it  can  not 
be  derived  analytically  in  closed  form  due  to  the  complexity  of  the 
system.  For  this  reason  the  Monte  Carlo  (MC)  simulation  method 
has  been  commonly  used.  Since  the  number  of  MC  samples  is  of 
the  order  of  I /Pc,  for  a  small  Pc,  this  number  can  be  quite  large. 
Important  sampling(IS)  is  one  variance  reduction  method  to  eval¬ 
uate  Pc  with  the  same  degree  of  of  accuracy  but  using  a  smaller 
number  of  samples  than  the  conventional  MC  method.  We  will 
consider  a  simplified  satellite  digital  communication  system  model 
with  uplink  and  downlink  noise.  The  nonlinear  element  of  the 
system  contains  the  third  order  term  for  modeling  the  saturating 
amplifier. 

Mean  Translation  Importance  Sampling 

In  the  MT  IS  scheme  [1]  the  new  random  vector  n*  is  obtained 
by  n'  =  n  -k  c,  where  c  is  a  constant  vector.  Therefore,  it  is 
critical  to  find  a  proper  value  c  in  order  for  the  IS  technique  to 
effectively  decrease  the  vari2mce  of  the  sampled  data.  For  the 
linear  system  most  efficient  MT  shift  cj(opt)  can  be  found  easily 
if  the  conditioned  ISI  pattern  is  used. 

In  order  to  find  a  cy  for  nonlinear  system  we  use  the  result  of 
large  deviation  theory  and  an  2tdaptive  scheme.  From  the  large 
deviation  theory  [2]  we  can  use  a  numerical  method  which  pro¬ 
vides  cj  whose  behavior  is  more  desirable  than  the  cj(opt)  of  the 
linearized  version  of  the  system.  Adaptive  scheme  [3]  includes 
the  process  of  finding  cy  in  the  simulation  program  estimating 
the  system  performMce.  Both  methods  can  be  used  to  obtain 
identicsJ  result  of  cy(opf)  when  applied  to  the  linear  system. 

Application 

From  analysis  we  can  show  that  for  a  linear  system  the  max¬ 
imum  MT  shift  which  gives  variance  reductions  as  compared  to 
the  MC  scheme  is  about  twice  as  large  as  Capi.  This  illustrates  the 
adverse  effect  of  a  large  MT  shift,  while  a  small  amount  of  shift 


is  innocuous.  We  consider  unconditioned  stream  ISI  simulation 
which  generates  both  the  ISI  sequence  vectors  and  noise  vectors. 
Rather  than  using  ej(opt)  for  each  ISI  sequence  generated,  a  fixed 
shift  e jic  is  used  for  the  MT  IS.  When  this  method  is  applied  to 
our  nonlinear  model,  the  c will  be  taken  as  the  optimum  cy  for 
the  least  signal  output  which  can  be  evaluated. 

For  systems  with  memory  of  length  M,  unconditioned  ISI  is  sig¬ 
nificantly  more  effective  than  conditioned  ISI  simulation  as  this 
scheme  does  not  require  2**”*  simulations  with  different  c’s.  An 
imconditioned  ISI  stream  simulation  is  applied  to  a  nonlinear  sys¬ 
tem  with  M=12.  Results  are  shown  in  the  following  figure  for 
three  different  downlink  to  uplink  noise  ratios.  The  efficiency 
(i.e.,  the  ratio  of  number  of  MC  simulations  to  the  number  of  un¬ 
conditioned  stream  IS  for  the  same  variance)  versus  Pc  is  shown. 
As  can  be  seen,  the  efficiency  increases  for  small  Pc. 

Conclusion 

We  presented  a  practical  method  to  estimate  Pc  for  a  nonlinear 
system  with  memory  by  biasing  the  Gaussian  noise  and  not  condi¬ 
tioning  on  the  ISI.  In  order  to  find  the  proper  MT  shift  amount, 
numerical  method  was  used.  When  the  adaptive  method  was 
used  assuming  no  knowledge  of  the  ideal  shifting  value,  the  nu¬ 
merically  obtained  minumum  rate  point  and  the  mean  value  of 
the  error  causing  noise  vector  were  almost  identical. 
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Summary 

Performance  analysis  of  discrete  time  systems  often  requires 
the  evaluation  of  the  expected  value  of  a  cost  function  of  the 
system  output  or  equivalently  the  expected  value  of  a  func¬ 
tional  of  the  system  input  vector.  In  either  case,  analytical 
expressions  for  the  performance  in  many  practical  situations 
are  typically  unavailable.  As  an  alternative  to  approximations 
or  bounds,  Monte  Carlo  simulations  are  often  employed  as  a 
convergent  method  for  obtaining  arbitrarily  accurate  estimates 
of  the  performance.  Unfortunately,  in  many  important  appli¬ 
cations,  the  required  computations  can  often  be  prohibitive. 
Therefore,  much  recent  research  has  focused  on  the  develop¬ 
ment  of  new  and  efficient  forma  of  the  method  of  Monte  Carlo 
known  as  Importance  Sampling  simulations. 

The  fundamental  problem  in  Importance  Sampling  is  to 
determine  the  appropriate  statistics  for  the  simulation.  It  is 
well  known  that  the  minimum  varirmce  statistics  can  not  be  im¬ 
plemented  since  they  require  the  explicit  knowledge  of  the  ex¬ 
pectation  one  is  attempting  to  estimate.  Therefore,  researchers 
have  worked  to  determine  “good”  suboptimal  statistics  from 
probability  measure  constraint  classes  which  exclude  the  de¬ 
generate  optimal  solution.  These  so-called  “biasing  strategies” 
are  typically  obtained  by  minimizing  the  variance  of  the  Impor¬ 
tance  Sampling  estimator  with  respect  to  the  biasing  statistics 
over  a  class  of  probability  measures  which  exclude  the  optimal 
distribution. 

Recently[3],  it  was  shown  by  the  authors  that  minimizing 
the  Importance  Sampling  variance  is  equivalent  to  minimizing 
an  Ali-Silvey  distance  [1]  of  equivalently  an  f-divergence  [2]  be¬ 
tween  the  admissible  biasing  densities  and  the  well  known  op¬ 
timal  biasing  density.  This  result  has  led  to  a  new  approach  in 
the  design  of  Importance  Sampling  strategies  where  one  merely 
determines  the  biasing  density  from  an  arbitrary  constraint 
class  with  minimum  “distance”  to  the  global  optimal  distribu¬ 
tion  to  minimize  the  simulation  variance. 

Extending  this  previous  research,  we  derive  in  this  work 
the  minimum  variance  biasing  distribution  from  a  constraint 
class  whose  controlling  parameter  is  fundamental  in  the  per¬ 
formance  analysis  of  Importance  Sampling.  In  addition,  we 
will  show  that  for  the  special  case  of  estimating  the  probabil¬ 
ity  of  rare  events,  the  constrained  optimal  biasing  distribution 
from  this  class  is  independent  of  the  unknown  parameter  and 
as  such,  leads  to  solutions  which  are  both  amenable  to  im¬ 
plementation  and  yet  still  optimal  with  respect  to  a  relevant 
criterion. 

To  motivate  the  proposed  constraint  class,  we  note  that  it 
has  been  analytically  established  for  the  very  important  special 
case  of  estimating  the  probability  of  a  rare  event  that  effective 

*  Supported  in  part  by  the  National  Science  Foundation  under 
Grant  NCR-9I098S8  and  in  part  by  Rome  Laboratories  under  con¬ 
tract  P3O6O3-93-C-0OS3. 

^Supported  in  part  by  the  IBM  Contract  89-S33302  and  the  Texas 
Advanc^  Tedinology  Grant  003604-018. 


Dept,  of  Electrical  and  Comp.  Eng. 
Rice  University 
Houston,  TX  77251-1892 


biasing  strategies  must  generate  the  rare  event  with  greater 
frequency  than  the  statistics  of  the  original  Monte  Carlo  simu¬ 
lation.  In  fact,  the  probability  of  the  rare  event  with  respect  to 
the  Importance  Sampling  statistics  is  the  controlling  parameter 
in  an  achievable  lower  bonnd  on  the  variaiKe  ot  the  Importance 
Sampling  simulation.  Therefore,  generalising  these  arguments, 
we  consider  the  class  of  all  biasing  distributions  with  probabil¬ 
ity  mass  over  an  arbitrary  Bord  subset  bounded  by  some  an 
arbitrary  constant. 

We  eliminate  the  mathematical  formalities  and  develop¬ 
ments  and  simply  stale  the  major  results.  The  optimum  bias¬ 
ing  distribution  from  any  constraint  class  is  that  distribution 
which  minimises  a  tpeeifie  f-divergence  to  the  global  optimal 
distribution.  Via  a  theorem  developed  in  this  work,  we  derive 
the  unique  biasing  distribution  from  the  constraint  class  de¬ 
scribed  above  which  minimises  every  f-divergence  or  Ali-Silvey 
distance  to  the  optimal  statistics.  As  a  special  case,  we  show 
that  this  distribution  minimises  the  Importance  Sampling  vari¬ 
ance  among  all  distributions  from  this  constraint  class.  More¬ 
over,  it  is  shown  that  the  savings  over  standard  Monte  Carlo 
simulations  obtained  through  the  use  of  this  distribution  can 
be  made  arbitrarily  close  to  the  optimal  savings  by  varying  the 
selection  of  the  parameters  of  the  constraint  set. 

It  is  shown  that  for  the  case  of  estimating  the  probability 
of  rare  events,  the  constrained  optimal  biasing  distribution  can 
be  made  completely  independent  of  the  parameter  of  interest, 
thus  readily  admitting  to  a  practical  implementation.  We  show 
further  that  the  computational  savings  obtained  through  the 
use  of  this  biasing  distribution  can  be  made  arbitrarily  large 
by  adjusting  the  parameters  of  the  constraint  class. 

We  conclude  this  work  with  an  asymptotic  analysis  of  the 
performance  gains  of  these  biasing  distributions.  Necessary  and 
sufficient  conditions  are  given  for  the  asymptotic  gains  achieved 
through  these  distributions  to  become  unbounded  as  the  prob¬ 
ability  of  the  rare  event  diminishes.  Furthermore,  a  methodol¬ 
ogy  for  selecting  sequences  of  Borel  bounding  sets  which  sat¬ 
isfy  these  conditions  and  yet  renders  simulations  which  are 
amenable  to  implementation  is  presented. 
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ABSTRACT 

The  problem  of  finding  an  importance  sampling  biasing 
density  for  estimating  the  performance  of  an  optimal  bi¬ 
nary  detection  system  is  addressed  geometrically.  This 
geometric  appr(»ch  allows  us  to  find  an  importance  sam¬ 
pling  biasing  density  for  any  pair  of  mutuuly  absolutely 
continuous  nominal  densities  unlike  other  methods  whioi 
are  problem  specific.  We  prove  that  the  biasing  density 
lying  geomeiricaUji  halfway  between  the  nominals  gives  an 
asymptotically  infinite  importance  sampling  gain  as  sys¬ 
tem  performance  improves  and  the  probability  of  error 
tends  to  zero. 


1.  BACKGROUND 

Consider  the  standard  binary  hypothesis  testing  problem 
of  determining  which  of  two  hypotheses  {Ho  and  Hi)  is 
true  based  upon  a  set  of  observations  u  =  {ui,  € 

fl,  the  observation  space.  Pq  and  Pi  denote  the  respec¬ 
tive  probability  measures  on  f)  that  correspond  to  these 
hypotheses.  The  optimal  detector  under  a  variety  of  per¬ 
formance  criteria  is  the  likelihood  ratio  test.  We  focus  here 
in  two  criteria:  minimum  probability  of  error  and  Neyman- 
Pearson. 

Calculation  of  performance  probabilities  is  usually  so 
complicated  that  numerical  estimation  is  required.  The 
number  of  Monte  Carlo  simulations  to  be  performed  to 
obtain  a  reliable  numerical  <»timate  of  the  performance 
can  be  very  large  since  it  is  inversely  related  to  the  per¬ 
formance.  A  technique  known  as  importance  sampling  has 
been  employed  to  greatly  reduce  the  number  of  simulations 
required  to  produce  accurate  estimates  [3].  Here,  observa¬ 
tions  are  generated  according  to  an  alternate  model  speci¬ 
fied  by  the  biasing  density,  the  utility  of  importance  sam¬ 
pling  hinges  on  finding  a  biasing  density  that  can  greatly 
reduce  the  number  of  required  simulations  to  achieve  ac¬ 
curate  performance  estimates.  A  measure  used  to  quantify 
this  reduction  is  the  importance  sampling  gain  P  =  j^,  the 
ratio  of  the  number  of  trials  N  required  for  Monte  Carlo 
simulations  and  the  number  of  trials  M  required  for  the 
importance  sampling  technique  such  that  the  estimates’ 
variances  are  equal.  In  general,  for  arbitrary  nominal  den¬ 
sities,  no  strai^tforward  method  is  known  to  produce  a 
biasing  density  that  provably  yields  gmn;  usually  problem 
specific,  ad  hoc  methods  are  used,  m  this  paper,  we  em¬ 
ploy  the  natural  geometry  underl)ring  detection  problems 
to  nnd  the  importance  sampling  biasing  density. 

2.  GEOMETRIC  IMPORTANCE  SAMPLING 

We  assume  that  Pq  and  Pi  are  mutually  absolutely  con¬ 
tinuous  with  respect  to  e^  other.  Let  pg  and  pi  de¬ 
note  the  probability  densitia  of  Pq  and  Pi  with  respect  to 
some  other  absolubel^r  continuous  measure  P.  Employing 
the  tools  of  differential  geometry,  we  have  analysed  else¬ 
where  [1,  2]  the  non-Riemannian  geometry  of  the  space 
of  all  probability  measures  on  O  mutually  absolutely  con¬ 
tinuous  with  respect  to  Pq  and  Pi.  In  this  geometry,  the 


natural  path,  the  so  called  geodesic  connecting  po  and  pi, 
is  the  exponential  mixture  density. 


Pu(ili)  = 


Ju 


0  <«<  1 


(1) 


The  normalization  factor  Ju  ia  n  strictly  convex  func¬ 
tion;  hence,  there  exists  a  unique  0  <  v  <  1  so  that 

=  info<u<i  Ju-  The  density  p,  has  the  property  that 
it  lies  midway  between  po  and  pi  in  the  sense  of  Kullback- 
Leibler  information.  We  have  shown  that  choosing  con¬ 
sistently  yields  significant  importance  sampling  gain.  Un¬ 
der  the  minimum  probability  of  error  criterion,  the  gain  F 
is  lower  bounded  by  the  reciprocal  of  the  error  pr<»abil- 
ity  raised  to  a  constant  power.  Asymptoticall)r,  as  perfor¬ 
mance  improves,  the  importance  sampling  gain  tends  to 
infinity.  Under  tbe  Neyman-Pearson  criterion,  the  density 
Pi  is  lued  in  Monte-Carlo  simulations.  When  the  observa¬ 
tions  are  IID,  we  have  shown  that  computationally  efficient 
estimates  of  the  miss  probability  PrfPoiPi]  occur  when 
importance  sampling  soiemes  employ  the  other  nominal 
Po.  The  importance  sampling  gain  is  bounded  according 
to  InF/n  >  /(P/|Po*)>  where  /(P'IPq)  i«  the  Kullback- 
Leibler  information  between  observations’  marginal  distri¬ 
butions. 

The  biasing  densities  corresponding  to  some  standard 
problems  are  very  interesting.  When  the  nominal  bias- 
mg  densities  are  equal-variance  Gaussians  with  means  mo 
and  mi ,  the  geometric  importance  sampling  biasing  den¬ 
sity  equals  a  Gaussian  with  mean  with  the  variance 

remuning  the  same.  The  geometric  biasing  density  corre- 
roonding  to  nominals  distributed  with  a  Cauchy  or  the 
Generalized  Gaussian  density  are  multimodal.  When  the 
nominals  are  homogeneous  Poisson  densities  with  means 

Ao  and  Ai,  then  a  Poisson  density  with  mean 
is  the  biasing  density.  When  the  nominals  are  multivariate, 
shifted-mean  (the  two  hypotheses  correspond  to  determin¬ 
istic  signals  in  additive  noise),  densities  symmetric  about 
the  mean,  then  =  ^. 
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Abstract 

We  investigate  transmission  of  correlated  information 
sources  over  an  interference  channel.  A  coding  scheme 
for  matching  the  source  to  the  channel  is  developed  and 
sufficient  matching  conditions  between  the  source  and  the 
channel  are  derived 

1  Introduction 

So  far,  the  problem  of  transmission  of  correlated  sources 
over  communication  channels  has  been  investigated  for 
the  multipleaccess  channels[l],  broadcast  channets(2]  and 
two-way  channels[3].  In  this  work  we  study  this  problem 
for  the  interference  channel.  The  existence  of  correlation 
between  sources  makes  it  possible  for  the  encoders  to 
partially  cooperate  and  this  partial  cooperation  results  in 
better  performance  compared  to  the  case  of  independent 
messages.  An  encoding  scheme  is  proposed  and  based  on 
this  scheme  sufficient  conditions  for  reliable  transmission 
of  correlated  sources  over  an  interference  channel  are 
obtained.  The  results  are  then  applied  to  two  classes 
of  interference  channels  for  which  the  capacity  region  is 
already  known. 

2  Main  Results 

The  interference  channel  is  a  mathematical  model  for  com¬ 
munication  between  M  uansmitter-receiver  pairs  over  a 
single  communication  medium.  The  existence  of  cor¬ 
relation  between  the  two  information  sources  makes  it 
possible  that  the  encoders,  although  located  separately, 
cooperate  to  some  extent.  The  cooperation  between  the 
encoders  can  be  employed  in  designing  improved  codes. 
We  employ  an  encoding  scheme  based  on  the  using  the 
covering  lemma  and  the  correlation  preserving  coding.  In 
the  first  stage  of  encoding  we  use  covering  to  represent 
the  information  dxxit  each  source  embedded  in  the  other 
source  by  an  auxiliary  raidom  variable.  The  next  step  is 
to  provide  partial  cooperation  between  the  encoden.  The 
codewords  generated  in  this  step  statistically  dqwnd  on 
the  inforimttion  content  of  the  e^  source  output  and  the 
auxiliary  random  variable  representing  the  information 
about  the  other  source  output.  The  decoding  scheme  and 
error  analysis  are  based  on  using  the  properties  of  jointly 

'Ihis  woifc  «u  (uppoftedby  the  National  Science  Ivundalkm  (Irani 
NC31-9I013<0 


typical  sequences.  Our  main  result  is  given  the  following 
theorem. 

Theorem:  Let  a  discrete-memoryless  interference 
channel  be  denoted  by  input  alphabets.  A'l.Aj,  out¬ 
put  alphabets  y\,yi,  and  a  probability  transition 
matrix.  p*(yi,y2|xi,x2).  where  p*(yr,y?|x5‘,xy)  = 
nr=i  P*(»i«iV2t|*itiX2i),  and  let  two  correlated  infor¬ 
mation  sources  be  generated  according  to  independent 
drawings  of  random  variables  5  and  T  with  joint  PMF 
P*(s,t).  Then,  if  there  exist  two  auxiliary  random  vari¬ 
ables  U  and  V'  such  that 

p(9,5,t,u,v,xi,X2,yi,y2)  = 

P*  (».  0p(9)p(«I».  9)p(v|t.  9)p(*i  I*.  9)  X 
xp(x2|t,  V,  fl)p*(yi ,  y2lx, ,  xi) 
and 

H{S\UVQ)  < 

H(S\VQ)  <  I{Y,:SXAVQ) 

H(S\UVQ)  /(T;  V\UQ)  <  /(T, ;  SVJf ,  |t/<?) 
H{S\VQ)  +  /(T;  V|g)  <  /(K,;  SKjT,  |Q) 
H{T\UVQ)  <  I(Y2\TX2\VVQ) 

H{T\UQ)  <  I[Y2-,TX2\UQ) 

H(T\UVQ)  +  I(S\U\VQ)  <  /{Kz;  TUX2\VQ) 
H(T\UQ)  -I-  /(5;  ll|g)  <  /(Kz;  TI/JfjlQ) 

the  correlated  sources  (S,  T)  can  be  reliably  uansmitted 
via  the  interference  channel. 

These  results  are  then  applied  to  some  special  cases  for 
which  the  capacity  of  the  interference  channel  is  already 
known. 
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Abstract 

W’e  find  the  capacity  region  of  a  two-user  Gaussian  multiaccess  chan¬ 
nel  with  intersymbol  interference  (ISl)  where  the  inputs  pass  through 
respective  linear  systems  and  are  then  superimposed  before  being  cor¬ 
rupted  by  an  additive  Gaussian  noise  process. 

We  give  a  novel  geometrical  method  to  obtain  the  optimal  input 
power  spectral  densities  and  the  capacity  region.  This  method  can 
be  viewed  as  a  nontrivial  generalization  of  the  sing'"-user  water-filling 
argument.  V\'e  show  that  as  in  the  traditional  memoryless  multiac¬ 
cess  channel,  FDMA,  with  optimally  selected  frequency  bands  for  each 
user,  achieves  the  total  capacity  of  the  A'-user  Gaussian  multiaccess 
channel  with  LSI.  However,  the  capacitj  region  of  the  two-user  channel 
with  memory  is,  in  general,  not  a  pentagon  unless  the  channel  transfer 
functions  for  both  users  are  identical. 

Summary 

In  a  recent  paper  [1],  a  limiting  e,\pression  for  the  capacity  regions 
of  multiaccess  channels  with  memory  was  obtained.  Such  a  limiting 
e.Npression  was  explicitly  evaluated  for  some  channels  with  memory  in 

[1]  and  [2].  In  [3],  we  show  that  the  limiting  expression  of  [1]  can  be 
u.scd  to  obtain  a  computable  capacity  region  formula  for  Gaussian  lin¬ 
ear  multiple-access  channels  with  finite  ISI.  We  extend  the  single-user 
water-filling  argument  to  the  two-user  case  and  derive  a  geometrical 
method  to  obtain  the  optimal  input  power  spectral  densities  (PSDs). 
We  show  that  the  optimal  input  PSDs  of  the  users  that  ma.ximizc 
the  total  capacity  do  not  overlap  in  the  frequency  domain.  As  in  the 
traditional  raemorylcss  multiaccess  channel,  FDMA  with  optimally  se¬ 
lected  frequency  bands  and  optimally  shaped  PSDs  achieves  the  total 
capacity. 

We  consider  a  general  Gaussian  multiaccess  channel  with  ISl 

Arsl  j=0 

where  Zi  is  the  output  of  the  channel,  A'j,.,  is  the  i  symbol  .sent 
by  user  t,  and  A'l  is  a  zero-mean  m-dependent  stationary  Gaussian 
noise  process  (i.e.,  R„  =  0,  V|7i|  >  m).  We  assume  that  the  k  user 
has  power  constraint  lU*  and  all  the  channels  seen  by  the  users  have 
finite-length  impulse  responses  with  length  less  than  or  equal  to  ii. 

The  capacity  of  a  single-user  Gaussian  channel  with  ISI  is  obtained 
using  the  Karhunen-Loeve  expansion.  This  exiiansion  decomposes 
tlic  channel  into  independent  parallel  memoryless  Gaussian  channels 
whose  capacities  are  well  hnown;  thereby  reducing  the  problem  to  one 
of  optimal  power  allocation  into  various  channels.  It  is  crucial  to  note 
that  the  kernel  used  in  the  Karhunen-Loeve  expansion  depends  on  the 
ISI  coefficients.  In  the  two-user  Gaussian  channel  with  ISI,  there  are 
two  sets  of  ISl  coefficients,  one  for  each  user.  If  the  channels  seen  by 
the  users  are  identical,  the  traditional  procedures  can  be  applied  and 
the  capacity  region  has  been  obtained  in  [4,  5].  If  the  sets  of  ISI  co¬ 
efficients  are  not  the  same,  a  similar  decomposition  cannot  be  applied 
since  no  kernel  can  simultaneously  decompose  the  signals  from  both 


We  also  extend  the  single-user  water-filling  argument  to  the  two- 
user  case.  We  derive  a  geometrical  method  to  obtain  the  optimal  input 
PSDs.  It  turns  out  that  this  geometrical  argument  can  be  explained 
via  two  main  ideas:  the  equivalent  channel  idea  and  the  successive 
decoding  idea  (decode  one  user’s  information  while  treating  the  other 
user’s  information  as  noise  first  and  then  decode  the  remaining  user’s 
information).  The  equivalent  channel  idea  bears  some  resemblance 
to  the  single-user  water-filling  argument  in  the  sense  that  it  obtains 
graphically  the  optimal  input  power  distribution  over  the  frequency 
domain.  It  can  be  applied  directly  to  the  single-user  channel  to  obtain 
the  optimal  input  PSD.  Roughly  speaking,  in  the  two-user  case,  the 
equivalent  channel  idea  determines  graphically  the  optimal  distribu¬ 
tion  of  the  total  power  over  the  frequency  domain,  while  the  successive 
decoding  idea  determines,  again  graphically,  the  optimal  split  of  the 
total  power  among  the  users  for  each  frequency. 

In  particular,  we  show  that  the  optimal  input  PSD  pair  achieving 
the  total  capacity  can  be  obtained  graphically  using  the  equivalent 
channel  idea  alone.  Moreover,  the  optimal  PSD  pair  do  not  overlap; 
hence,  as  in  the  memoryless  multiaccess  channel,  FDMA,  with  opti¬ 
mally  selected  bands  and  optimally  shaped  PSDs,  achieves  the  total 
capacity  of  the  multiaccess  channel  with  ISL 


Theorem  1 

For  any  A'-user  m-dependent  Gaussian  multiaccess  channel  with 
finite  intersymbol  interference  and  power  constraints  lUi, . . . ,  ILf,-,  the 
total  capacity  can  be  achieved  by  FDMA  with  optimal  input  PSD 
A'-tupIe,  (5i(u>), . . . ,  Sf;(w)),  where 


5i(tn) 


Sk(w) 


Sji(w) 

h 

Ml  -  if  biT^:\w)  <  biTrHw)  for  I  #  k, 

1  0  otherwise. 


Tkiiv)  =  |//fc(in)p/A'(u))  is  the  magnitude  square  of  the  transfer  func¬ 
tion  over  the  noise  PSD,  and  6i , . . . ,  t/c  are  chosen  such  that 


1  /■”  - 
Jr  Jo 


w)dw  =  6tIUn 


for  A-  =  1, . . . ,  A’. 
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users. 

Therefore,  in  order  to  obtain  the  result  in  the  multiuser  case,  a 
new  approach  based  on  the  circular  channel  methods  of  [2]  and  {0) 
arc  employed.  This  approach  enables  an  orthogonal  decomposition  of 
the  channel  using  the  discrete  Fourier  transform  which  is  independent 
of  the  ISI  coefficients.  In  this  paper,  we  employ  these  ideas  and  the 
limiting  expression  for  the  capacity  region  of  multiaccess  channel  with 
memory  in  [1]  to  obtain  the  capacity  region  of  the  Gaussian  multiaccess 
channel  with  ISL 
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Abstract 

The  capacity  C{L)  of  the  arbitrarily  varying  chan¬ 
nel  for  deterministic  list  codes  of  fitei  list  sise  L  is 
considered  under  the  average  probability  of  error  cri¬ 
terion.  When  the  random  coding  capacity  Cr  is  pos¬ 
itive,  it  is  shown  that 


is  invariant  under  all  permutations  of  the  arguments 
a,si,...,Xn,  for  ally,  x,s,si,...,Xm'  By  definition, 
we  take  all  AVCa  to  be  0-symmetrisable.  O 

Remarkt:  (1)  This  definition  generalises  the  sym- 
metrisability  condition  of  [2].  (2)  If  an  AVC  is  m- 
symmetrisable  then  it  is  also  m'-symmetrisable  for 
all  0  <  m'  <  m. 


I>I*, 

L<L\ 


where  L* ,  called  the  oymmeirixahiHiy ,  is  a  computable, 
non-negative  integer.  Thus  L*  +  1  is  the  smallest  list 
sise  for  which  a  positive  rate  is  acluevable. 


Theorem  It  If  an  AVC  is  n^symmetrisable  then 


u.  /(S  A  YIX) 

min{log  13^1, log |5|}  . 


Summary 

At  the  1990  IEEE  Information  Theory  Workshop, 
M.S.  Pinsker  coiyectured  the  foUoiring  theorem:  For 
an  arbitrarily  varying  channel  (AVC),  every  rate  be¬ 
low  the  random  code  capacity  C,  is  achievable  with 
deterministic  list  codec  of  eotutani  list  sise,  if  the  av¬ 
erage  probability  of  error  criterion  is  used.  Ahlswede 
and  Cai  [1]  proved  this  coigecture  by  showing  that 
codes  e»st  with  rate  iZ,  list  vise  L,  and  average  prob¬ 
ability  of  error  A  for  all 


O 

Definition  2t  The  maximum  m  for  which  the  AVC 
b  m-symmetrisable  is  called  the  symmelrusitltly  and 
is  denoted  by  L*.  O 

Theorem  1  and  Definitions  1  and  2  imply  a  com¬ 
putable  characterisation  of  L*. 

Theorem  2i  The  list-£  capacity  of  the  AVC  under 
the  average  probability  of  error  criterion  is 

L<L-:  p) 


L> 


log  1^1 

XE{R)  ’ 


R<Cr  , 


(1) 


O 

Examplet  For  AT  =  5  =  {0, 1},  3^  =  {0, 1, 2}  and 


where  E{R)  is  the  random  coding  reliability  function 
of  the  AVC,  and  S  is  the  channel  state  alphabet.  This 
list  sise  depends  on  R,  A,  and  S,  and  grows  without 
bound  as  A  — »  0  or  iZ  — »  Cr. 

The  contribution  of  this  paper  is  to  show  that 
Pinsker’s  conjecture  holds  for  a  constant  list  rise  L 
that  depends  only  on  the  channel  and  not  on  A,  A,  or 
5.  Moreover,  we  determine  the  smallest  list  sise  for 
which  the  conjecture  is  valid. 

Consider  a  discrete,  memoryleas  AVC  with  transi¬ 
tion  probabiUty  W :  ;ir  x  5  3^.  Let  X,  5,  and  Y  be 

random  variables,  with  p.m.f.  P{a)Q{$)W{y\x,t). 

Definition  li  An  AVC  is  m-symmetrisskle  if  there 
exists  a  channel  U  :  — »  S  such  that  the  channel 

V  :  -♦  y  defined  by 

»i, . . . ,  »m)  s  5^  W{y\»,  s)If(s|»i, . . . ,  »m) 

•fS 

Supported  in  pwt  bjr  ARO  Grant  DAAL03-8».K-0130. 


y  =  *  +  s, 
y  *  +  s. 


(3) 


Cr  =  0.5  bits/channel  use,  and  L*  =  1.  Conse¬ 
quently,  there  exist  codes  for  all  A  <  0.5  such  that 
the  receiver  can  reliably  narrow  the  transmitted  mes¬ 
sage  to  one  of  two  possibilities,  but  no  further. 

References 


[1]  R.  Ahlswede  and  N.  Cai,  *Two  proofs  of  Pinsker's 

conjecture  concerning  AV  channels,”  IEEE  TVans- 
aetiona  on  Information  Theory  ,  IT-37  (6),  pp. 
1847-1749,  November  1991. 

[2]  I.  Criasdr  and  P.  Narayan,  The  capacity  of  the 

arbitrarily  varying  channel  revirited:  positivity, 
constraints,”  IEEE  Unneaetione  on  Informa¬ 
tion  Theory  ,  IT-34  (2),  pp.  181-193,  March 
1988. 


210 


Coding  Strategies  for  the 
Permuting  Jammer  Channel 

by 
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SUMMARY 

The  difficulty  of  determining  results  on  the  capacity  of  finite- 
state  channels  with  memory  resides  with  the  added  complexity 
due  to  the  memory.  The  natural  approach  is  to  study  a  simple 
model.  The  Permuting  Channel  is  one  such  model.  The  best  way 
to  introduce  the  Permuting  Jammer  Channel  is  by  an  example. 
Consider  Blackwell’s  Trapdoor  channel  [Ij. 

Example.  Consider  the  case  of  two  trapdoors  with  each  door 
having  the  same  probability  of  being  opened,  as  shown  in  Figure 
1.  Initially  (Fig.  la),  a  ball  labeled  either  0  or  1  is  present  in  one 
of  the  two  slots.  Then  (Fig.  lb)  a  ball,  either  a  0  or  1,  is  placed 
in  the  empty  slot,  after  which  (Fig.  Ic)  one  of  the  trapdoors 
opens.  The  ball  lying  above  the  open  door  then  falls  through. 
The  door  closes  (as  in  Fig.  la)  and  the  process  is  repeated. 


(»)  (M  U) 


Figure  1.  Blackwell’s  trapdoor  channel  (l| 

We  generalize  this  example  as  follows; 

Deflnition.  The  permuting  channel  is  a  finite-state  channel 
consisting  of  /?  -)-  1  trapdoors  with  each  trapdoor  compartment 
capable  of  holding  only  1  char^u;ter.  At  the  start,  P  of  these  com¬ 
partments  are  occupied.  The  alphabet  consists  of  a  characters. 
The  initial  state.  So,  represents  the  nature  of  the  /?  characters 
at  the  start. 

The  operations  of  this  channel,  as  introduced  by  Ahlswede 
and  Kaspi  [2],  depends  on  three  participants  as  shown 


Figure  2.  I  -  Sender,  O  -  Trapdoor  Selector,  R  -  Receiver. 

Definition.  In  the  permuting  jammer  channel  (PJ  Channel), 
O  acts  as  a  jammer  in  frustrating  I,  the  message  sender,  by 
scrambling  the  output.  Here,  we  shall  consider  the  condition 
that  the  initial  state  is  known  to  all  users. 

In  [2],  Ahlswede  and  Kaspi  found  the  capacity  of  the  PJ 
chsjinel  in  the  binary  case.  Piret  [3]  solved  the  case  of  0  = 
1  memory.  In  this  paper,  the  capacity  for  the  case  a  =  3  is 
established. 

To  find  the  capacity,  we  follow  the  lead  of  Ahlswede  and 
Kaspi,  of  constructing  maximal  codes  for  particular  input  length 
n.  Let  us  define  certain  properties  of  codes. 


Definition.  A  code  is  called  maximal  for  particul  input  length 
n  if  no  other  code  has  Isirger  cardinality. 

Definition.  A  code  U  is  packed  or  is  a  packed  code  if  it  cannot 
accommodate  any  other  codewords. 

A  maximal  code  is  necessarily  packed,  but  a  packed  code 
need  not  be  maximal.  Packedness  is  useful  as  a  first  test  for 
maximality. 

Definition.  A  repeated  code  is  the  cartesian  product  (/’"  of  a 
code  U. 


Ahlswede  and  Kaspi  [2]  established  their  result  by  consider- 

9+i  S+i  s+i 

ing  Ui(a,P)  :=  {11  ■  ■  •  1,22  •  ■  ■  2, . . .  ,'aa-  -  a}  and  verifying  that 
is  a  maximal  code  for  length  n  =  (0  +  l)m. 


We  first  note  that 


Lemma.  For  a  >  2,0  >  1  and  arbitrary  So,  Ui(a,0)  is  the 
only  maximal  code  for  n  =  0  +  1. 

Thus,  in  the  binary  case,  the  repeated  code,  Ui(2,0)”',  is 
maximal.  We  want  to  explore  conditions  on  maximal  codes 
which  yield  maximal  repeated  codes. 

As  a  matter  of  notation,  we  let  5a  |z  — >  y  to  denote  the 
output  y  is  reachable  from  input  z  and  initial  state  5o.  Fur¬ 
thermore,  let  K(5o|z)  denote  the  cover  of  z,  i.e.,  the  set  of  all 
possible  y  reachable  from  z,  and  y’(5o|(/)  =  Uiei;Y(5olx). 
Definition.  A  set  1/  C  C"  has  the  Universal  Property  if  for  all 
5, ,52  6  S  and  for  all  z  e  C”,  y(5,|z)nY(52|If)  7^  9,  i.e.,  there 
exist  u  6  If  and  y  €  C"  such  that 


5i|z — >y  and  52|u — ►  y. 

A  code  1/  is  a  Universal  Code  or  is  said  to  be  universal  if  it 
has  the  Universal  Property. 

Universal  codes  are  packed.  To  see  this,  just  let  5j  =  S3. 
Furthermore,  repeated  Universal  codes  are  also  Universal  codes. 
Definition.  By  extending  a  set  Z'  C  C"  to  a  code  Z  C 
we  mean  to  augment  each  sequence  Zi  €  Z'  by  adjoining  to  it 
attachment  words  c,j  6  C*  such  that  Z  —  {zi,j  =  ZiCij)  is  a 
code.  If  there  are  in  total  m  c^j’s  adjoint  to  z,,  we  say  that  z, 
contributes  at  codewords  to  Z. 

Definition.  Let  U  C  C"  be  a  code,  k  =  |U|,  and  Z'  c  C"  a 
set  extended  to  a  code  Z  C  C^”.  A  maximal  code  U  is  strongly 
maximal  if  for  any  set  V  of  atmost  k  sequences  with  pairwise 
non-disjoint  covers,  i.e.  V  =  {zi,Z3,. . .  ,zi(G^  Z')}s.t.  y(5o|Zi)n 
y(5olx>)  /  0,  y  contributes  atmost  k  codewords. 

We  note  that  strongly  maximal  codes  are  universal  codes. 
The  truth  of  the  converse  statement  is  not  known. 


Result  1.  Repeated  strongly  maximal  codes  are  maximal. 

This  establishes  strongly  maximal  as  the  condition  sufficient 
for  constructing  maximal  repeated  codes.  Is  there  a  weaker  con¬ 
dition?  Examples  of  strongly  m2ucimal  codes  are  haird  to  to  find. 
For  the  ternary  character  set  we  found  Uj(3,0)  to  be  strongly 
maximal,  and  thus  I/)  (3,/?)'”  is  maximal  for  n  =  (/9  +  l)m.  Thus 

Result  2.  For  a  =  3,  0  >  1  and  for  all  So  the  capacity  of  the 
permuting  Jammer  channel  is  C  j(3, 0,So)  =  . 
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SOURCE  DIVISIBILITY  AND  RELATED  PROBLEMS 
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Vavilov  str.  40 
117333  Moscow,  Russia 

Multi-level  coding  schemes  reveal  a  special 
information-theoretical  problem  called  the  divisi¬ 
bility  problem  that  is  connected  to  a  tree-like  struc¬ 
ture  of  the  codes.  It  can  be  traced  in  source  and 
channel  coding  as  well  as  to  hierarchical  cover¬ 
ing  metric  spaces.  In  the  broadcast  source  coding 
scheme,  divisibility  looks  like  a  problem  of  addi¬ 
tional  information  necessary  to  achieve  per-letter 
distortion  tj  <  e,  provided  that  the  distortion  level 
fi  has  been  already  achieved.  We  say  that  the 
source  is  divisible  for  the  pair  (fi,C2)  if  this  ad¬ 
ditional  amount  of  information  (per  source  letter) 
is  equal  to  R((2)  -  R{^2)  ,  where  R{t)  is  distor¬ 
tion  rate  function  of  the  source.  It  is  easy  to  show 
that  the  source  is  divisible  if  and  only  if  the  matrix 
equation  •  s  =  holds,  where  q^  is  the  test- 
channel  matrix  of  the  source,  has  a  stochastic  so¬ 
lution  3  =  Sejj, .  The  divisibility  equatioon  was  in¬ 
troduced  by  Koshelev  [Probl.  Peeredachi  Inform., 

3,  1980,  pp.  31-49,  in  Russian);  recently  it  was 
also  described  by  Equitz  and  Cover  {IEEE  Trans, 
on  IT,  2,  1991,  pp  269-275). 

To  solve  the  equation  we  propose  a  method  of 
local  divisibility  letting  -b  ^  0.  Using 

it  we  find  strong  conditions  under  which  equiprob- 
able  ternary  memoryless  source  with  balanced  dis¬ 
tortion  measure  is  divisible.  We  discuss  divisibility 
of  metric  spaces  and  construction  of  hierairchical  f- 
nets.  Finally  we  consider  multi-level  channel  cod¬ 
ing  and  the  problem  of  channel-input  divisibility. 
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Abstract 

X  and  Y  are  random  variables.  A  person  who  knows 
X  wants  to  convey  it  to  another  person  who  knows  Y. 
What  is  the  minimum  number,  L{X\Y),  of  bits  he  must 
transmit  on  the  average?  Let  G  be  an  appropriately 
defined  communication  hypergraph  of  X  and  Y ,  and  let 
H{G,X)  be  its  graph  entropy.^  We  show  that  for  ail 
(A,  y)  pairs,  L{X\Y)  >  H(G,X)  and  that  for  a  large 
class  of  (.Y,  y)  pairs  L{X\Y)  <  H{G,  X)  +  logc  +  1. 

Summary 

A"  is  a  random  variable.  A  person  who  knows  X  wants 
to  convey  it  to  another.  What  is  the  minimum  number, 
L(X),  of  bits  he  must  transmit  on  the  average?  The 
well  known  answer  to  this  question  is.^ 

H(X)  <  L(X)  <  H(X)+l  . 

Surprisingly,  the  answer  to  the  following,  closely  re¬ 
lated  question,  is  not  known.  X  and  Y  are  random 
variables.  A  person  who  knows  X  wants  to  convey  it  to 
another  person  who  knows  Y.  Feedback  is  not  allowed. 
What  is  the  minimum  number,  L(X\Y),  of  bits  he  must 
transmit  on  the  average? 

As  usual,  we  assume  that  both  persons  agree  in  ad¬ 
vance  on  an  encoding  of  X  that  must  be  prefix  free  given 
the  value  of  y.  Standard  reasoning  yields 

H(X\Y)  <  HX\Y)  <  H(X)  +  I  . 


the  lower  bound  holds  for  all  (A,y)  pairs,  and  that 
the  upper  bound  holds  for  a  large  class  of  (A,  y)  pairs. 
We  do  not  know  whether  the  upper  bound  holds  for  all 
(A,y)  pairs. 

Graph  entropy  wcis  introduced  by  J.  Korner  [1].  It 
was  recently  applied  to  prove  lower  bounds  on  per¬ 
fect  hashing,  circuit  complexity,  and  sorting.  Yet  these 
proofs  used  technical  properties  of  this  formally-defined 
(see  below)  functional  and  did  not  shed  light  on  its  un¬ 
derlying  meaning.  Our  bounds  provide  an  intuitive  in¬ 
terpretation  for  this  incre2isingiy  prevalent  measure. 

We  conclude  the  summary  by  defining  the  communi¬ 
cation  hypergraph  and  its  graph  entropy.  Let  (A',  y)  be 
distributed  over  A  x  according  to  some  probability 
distribution  p{x,y).  The  communication  graph  G  of  A 
and  y  has  A  as  its  vertex  set,  and  for  every  y  G  y  it  has 
the  hyperedge  Cy  {x  :  p(x,  y)  >  0}.  This  communica¬ 
tion  hypergraph  is  equivalent  to  a  graph  defined  by  [2] 
and  used  in  [3]  to  analyze  the  number  of  bits  needed 
in  our  problem  if  the  two  persons  are  allowed  to  com¬ 
municate  back  and  forth.  This  paper  concerns  only  the 
one-way  version  of  this  problem. 

Let  G  be  a  hypergraph  and  let  A  be  a  random  vari¬ 
able  ranging  over  its  vertices  (in  our  problem  the  ver¬ 
tices  of  G  were  defined  by  Y).  A  set  of  vertices  of  G  is 
independent  if  no  two  of  its  rnttibers  belong  to  the  same 
edge.  Denote  by  7(G)  the  collection  of  independent  sets 
in  G.  The  hypergraph  entropy  //(G,  A)  of  G  and  A  is 

//(G,  A)  min{/(A;Z)  :  A  G  Z  £  7(G))  . 


Yet  a  simple  example  shows  that  neither  bound  is  tight. 
For  f  G  [0, 1)  define 


Pt{x,y) 


for  z  =  y, 
for  X  ^  y. 


and  let  (A,  Y)  be  distributed  according  to  p,(r,y). 
Then  //(A)  =  logn,  while //(A|y)  =  /»(f)+<  log(n- 1). 
For  f  =  0  it  is  easy  to  show  that  L(A|y)  =  0  =  H(X\Y) 
whereas  for  f  >  0  we  have  L(A|y)  =  logn  =  //(A). 

In  this  paper  we  provide  a  partial  solution  to  the 
above  question.  We  define  G,  the  communication  hy¬ 
pergraph  of  A  and  Y,  and  show  that  for  many  (X,Y) 
pairs,  L(Aiy)  is  roughly  H(G,  A),  the  graph  entropy  of 
G  and  A.  More  precisely,  we  show  that  in  the  inequality 


Namely,  it  is  the  minimum  mutual  information  between 
A  and  any  random  variable  Z  ranging  over  independent 
sets  of  G  and  c>  astrained  to  always  contain  A'. 
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Abstract 

Suppose  that  we  have  an  information  storage  net¬ 
work  with  m  users  end  n  disks,  each  disk  having  the 
same  capacity.  Different  users  are  connected  to  ar¬ 
bitrary  (perhaps  overlapping)  subsets  of  the  n  disks, 
and  some  of  the  disks  might  fail.  We  wish  to  encode 
a  binary  information  sequence  such  that  for  a  spec¬ 
ified  m-tuple  (At,..., Am),  the  tth  user  can  reliably 
recover  the  first  Xi  bits  of  the  sequence. 

There  is  a  natural  upper  bound  on  each  individual 
.Yi.  If  this  bound  can  be  attained  simultaneously  for 
each  of  the  m  users,  we  say  that  se^ential  refinement  is 
possible.  We  find  necessary  and  sufficient  conditions 
for  a  storage  network  to  admit  sequential  refinement. 

Summary 

Consider  an  information  storage  network  with  m  users, 
f-U,U2,  ■  ■  ■  ,Um,  connected  to  arbitrary  (perhaps  over¬ 
lapping)  subsets  Si,S2,-  -,Sm  of  a  set  of  n  disks, 
{£>1,  Dj. . . For  1  <  «  <  m,  let  rjj  =  |5j|.  For 
simplicity,  we  assume  that  the  n  disks  all  have  the  same 
capacity,  C  bits.  If  user  Ui  is  guaranteed  access  to  just 
ki  disks  of  Si  at  any  given  time  (ki  <  Hi),  then  he  can 
hope  to  recover  at  most  ifc,C  bits  of  information  reliably. 

We  wish  to  encode  an  r-bit  information  sequence 
(j/ti’  -.l/r)  so  that  each  user  Ui  (1  <  «  < 
can  reliably  recover  the  first  Xi  bits  of  the  sequence, 
(yii  •  ■ . yx,)-  For  which  m-tuples  (Aj , . . . ,  Am)  is  such 
an  encoding  possible? 

Ideally,  we  might  hope  to  achieve  (Ai,...,Am)  = 
(kiC, . . .  ,kmC).  In  such  a  case,  we  say  that  sequen¬ 
tial  refinement  is  achievable.  Unfortunately,  this  is  not 
always  possible.  We  find  necessary  and  sufficient  condi¬ 
tions  for  a  network  to  admit  sequential  refinement.  Be¬ 
fore  stating  the  conditions  (Theorem  1),  we  give  several 
definitions. 


Remark:  It  follows  from  Definitions  3  and  4  that 
Ni{s)  =  fij  for  ki  <  8  <  Hi- 

The  conditions  for  sequential  refinement  can  be  shown 
to  be  related  to  the  triangularity  of  certain  incidence 
matrices  determined  by  the  network  topology.  We 
therefore  make  the  following  definition. 

Definition  5  User  Ui  (1  <  »  <  m)  is  triangular  if 
Ni{8)  <  8  for  all  8  <  ki  —  1.  A  storage  network  is  fully 
triangular  if  each  user  is  triangular. 

Theorem  1  Assume  that  each  disk  tn  a  storage  net¬ 
work  of  n  disks  has  capacity  C  bits.  If  the  storage  net¬ 
work  admits  sequential  refinement,  it  is  fully  triangular; 
the  converse  holds  if  C  >  n,  or  if  fcmax  <  n/2  and  C  > 
^max  1®82 

In  practice  the  number  of  bits  per  disk,  C,  will  gener¬ 
ally  be  much  larger  than  the  number  of  disks,  n.  Thus  in 
all  cases  of  practical  interest,  a  network  admits  sequen¬ 
tial  refinement  if  and  only  if  the  network  is  fully  triangu¬ 
lar.  The  necessity  of  this  condition  follows  from  simple 
information-theoretic  arguments.  The  sufficiency  can 
be  established  by  a  constructive  scheme  that  encodes 
information  using  linear  algebra  over  Galois  fields. 


Definition^  £e(  kmax  =  maxi<i<m 

Definition  2  Let  E  be  the  set  of  edges  between  users 
and  disks;  ..e., 

E  =  {(i,  j)  :  User  Ui  is  connected  to  disk  Dj]. 

Definition  3  For  1  <  J  <  n,  associate  with  disk  Dj 
the  disk  degree 


Definition  4  For  1  <  i  <  m  and  1  <  s  <  n,,  let  Ni(s) 
be  the  number  of  disks  Dj  connected  to  user  Ui  that 
satisfy  the  inequality  dj  <  s. 
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Abstract 

The  Huffman  algorithm  was  originally  devised  to 
construct  the  minimal  expected  length  instantaneous 
source  code  for  a  random  variable.  In  this  paper,  we 
identify  the  Haxdy>Littlewood- Polya  inequahty  as  the 
key  step  in  the  proof  of  the  optimality  of  the  Huffman  al¬ 
gorithm  and  use  this  to  provide  an  unified  framework  for 
various  applications  of  the  Huffman  algorithm.  Quan¬ 
tities  that  are  analogous  to  entropy  are  identified  for 
these  applications.  We  also  consider  the  case  when  the 
weights  of  the  leaves  of  the  tree  are  independent  random 
variables.  A  Huffman  algebra  is  defined,  which  provides 
conditions  under  which  the  Huffman  algorithm  is  opti¬ 
mal  for  the  case  of  random  weights.  In  particular,  it  is 
shown  that  the  “most  balanced”  tree  is  optimal  for  the 
case  of  independent  and  identically  distributed  weights 
for  any  arrangement  increasing  Huffman  algebra. 

The  Huffman  coding  algorithm  is  a  greedy  bottom  up  tree  build¬ 
ing  algorithm  construcu  the  optimum  source  code  for  a  random 
variable,  i.e.,  it  finds  a  binary  tree  that  minimises  the  weighted 
lengths  ^  Pili ,  where  pi  is  the  probability  of  symbol  t  and  U  is  the 
depth  of  the  corresponding  leaf.  The  same  greedy  tree  building 
algorithm  was  later  applied  to  construct  trees  that  minimise  other 
tree  functionals  such  as  max,(pi  +  li),  which  has  applications  in 
circuit  design  and  parallel  procesaing[3]. 

In  this  paper,  we  provide  an  unified  framework  for  the  different 
applications  of  the  Huffman  algorithm  and  extend  some  of  the 
results  to  the  case  when  the  weights  of  the  leaves  are  independent 
random  variables.  In  the  proof  of  the  optimality  of  the  Huffman 
algorithm,  the  key  step  turns  out  to  be  a  rearrangement  inequality 
called  the  N'lrdy-LitUewood-Polt/a  inequality,  wMcb  states  that  for 
any  real  numbers,  a,  b,  e  and  d,  with  a  <  6  and  c  <  d,  then 
a-d  +  b-  e<a-e  +  b-d.  This  inequality  is  used  to  show  that  the 
longest  codewords  are  associated  with  the  lowest  weights. 
Deterministic  weights:  Motivated  by  Uus  inequality, 
we  consider  a  two  operator  algebra  (5,0,®)  that  satisfies  this 
rearrangement  inequality  with  -f-  replaced  by  0  and  -  replaced 
by  ®.  We  call  these  algebras  arrangement  increasing  Huffman 
algebras  (AIHA)  (extending  an  earlier  definition  by  Knuth[4]  or 
one  operator  Huffman  algebras).  We  also  define  arrangement  de¬ 
creasing  Huffman  algebras  (ADHA)  when  the  inequality  in  the 
Hardy-Littlewood-Polya  inequahty  is  reversed.  We  show  that  if 
we  define  the  cost  of  a  tree  T  as  w(T)  =  ®  FI®,  'f)- 

then  the  Huffman  algorithm  on  an  AIHA  (resp.  ADHA)  will  pro¬ 
duce  an  optimal  tree  that  minimises  (resp.  maximises)  the  cost 
of  the  tree.  (7  is  a  constant  that  represents  the  cost  of  one  level 
of  the  tree.) 


Table  1.  Huffman  algebras  for  deterministic  weights 


Algebra  (5,  ©,  ®) 

Type 

Objective 

(TOTo 

(71,max,-f-) 

(77,  min, -4- ) 

(77,  max,  min) 

(77'*^, max,  •) 

(77'*',  min,-) 

AIHA 

AIHA 

ADHA 

AIHA 

AIHA 

ADHA 

Di-I  • 

minmaxi(p<  +  jU) 
maxmini(p<  7!*) 
min  maxi  (min(pi ,  7)) 
min  msiXi{pi  ■  7'*) 
maxnuni(p(  -  7'*) 

Just  as  the  entropy  of  a  random  variable  is  a  fundamental 
lower  limit  to  the  expected  length  of  the  Huffman  code,  we 
show  that  the  various  examples  of  Huffman  algebras  in  Table  1 


have  analogous  fundamental  limits.  In  particular,  we  define 

log,  (531^1  2’’’)-  show  that[3,  5] 

/(p)  ^  ^ 

J(p)  -  1  <  max  min(p,  +li)  <  J(p)  <  min  max(p,  + 1)  <  J(p)  -f- 1 

T  i  T  i 

The  quantity  /(p)  is  a  scaled  version  of  the  Renyi  entropy  of 
the  distribution  p.  The  quantity  J(p)  can  be  identified  as  the 
fundamental  limit  in  a  model  of  parallel  processing  in  which  tasks 
of  length  pi,P3, . . .  ,Pm  are  executed  in  parallel  and  their  results 
are  combined  two  at  a  time.  We  also  define  the  concept  of  tree 
extensions  that  is  analogous  to  block  coding  for  source  codes,  and 
show  that  it  is  possible  to  get  arbitrarily  close  to  the  fundamental 
limits  using  tree  extensions. 

Random  weights:  When  the  weights  of  the  leaves  are  in¬ 
dependent  random  variables,  we  ask  the  question — when  is  the 
Huffman  algorithm  optimal  in  terms  of  expected  cost?  To  an¬ 
swer  this  question,  we  define  Huffman  algebras  for  independent 
random  variables.  Random  weights  introduce  many  new  difficul¬ 
ties.  For  example,  there  is  no  total  order  among  random  vari¬ 
ables;  iiutead,  various  partial  orderings  have  been  defined,  such 
as  likelihood  ratio  ordering,  stochastic  ordering,  and  increasing 
convex  ordering.  Chang  and  Yao  [1]  derived  stochastic  versions 
of  Hardy-Littlewood-Polya  inequalities  using  three  different  p»ar- 
tial  orderings  for  the  random  variables. 

Motivated  by  the  requirements  for  the  Hardy-Littlewood-Polya 
inequality,  we  define  an  arrangement  increasing  Huffman  alge¬ 
bra  (5, 0,®,<t, <},<()  as  a  set  with  two  operators  0  and  ® 
and  three  different  partial  orders  <1,  <2,  <s,  which  satisfies  vari¬ 
ous  consistency  properties  for  the  orderings  and  also  satisfies  the 
Hardy-Littlewood-Polya  inequalityfl].  For  Huffman  algebras  that 
are  closed  under  0  and  ®,  we  can  show  that  the  Huffman  algo¬ 
rithm  produces  the  optimal  tree.  However,  most  Huffman  alge¬ 
bras  are  not  closed,  and  in  these  cases,  it  is  difficult  to  proceed 
with  the  Huffman  algorithm  after  the  first  step,  since  the  new 
weight  produced  by  combining  the  lowest  two  weights  is  not  di¬ 
rectly  comparable  w:th  the  other  weights. 

However,  in  the  special  case  when  the  weights  are  i.i.d.,  we  can  use 
the  concepts  of  stochastic  msjorisation  to  prove  that  the  most  bal¬ 
anced  tree  minimises  the  expected  cost  tv(7^  for  arrangement  in¬ 
creasing  Huffman  algebras.The  “most  balanced”  tree  can  be  con¬ 
structed  from  the  top  down  using  a  procedure  called  the  “power 
of  3”  rule[3].  However,  the  “most  balanced”  tree  is  not  always 
optimal  for  ADHA’s,  and  we  conclude  with  counterexample  to 
illustrate  that  our  intuition  about  “balanced”  trees  is  not  always 
valid  for  the  stochastic  case. 
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nia  problaa  of  daslgnlng  a  aaquanca  of 
optlaal  binary  tasts  for  tha  Idantlflcatlon  of 
a  slngla  faulty  coaponant  la  addrassad.  For 
coaponants  In  llnaar  ordar  thla  la  aqulvalant 
to  tha  classical  ali^abatlc  coding  problaa 
solvad  by  Hu  and  Tuckar.  For  partially 
ordarad  coiq>onsnts  tha  problaa  Is  aora 
difficult.  Kara,  tha  problaa  Is  solvad  by 
raductlon  to  a  alnlalzatlon  ovar  a  sat  of 
ali^abatlc  problaas. 

Our  problaa  Is,  for  a  glvan  partial 
ordar  and  asslgnaant  of  probabllltlas  to  tha 
nodas  In  Its  Hassa  dlagraa  (%rhlch  corraspond 
to  tha  coivoMnts) ,  to  find  a  saquance  of 
uppar  sats  to  tast  so  as  to  alnlalza  tha 
avaraga  nuabar  of  tasts  to  locata  tha 
dafactlva  noda.  An  uppar  sat  Is  defined  by: 

Definition;  If  X  Is  a  noda,  the  upper  sat  of 
X,  D(X)  -  {Y  :  y  2  X},  Is  X  together  with  the 
sat  of  all  nodas  greater  than  X  In  tha  partial 
ordering. 

Tha  saquenca  of  uppar  sets  to  be  tasted  can  be 
raprasantad  as  a  binary  search  tree.  A  search 
tree  tdilch  nlnlalzas  tha  avaraga  nimber  of 
teats  Is  called  an  optlaal  search  tree. 

Brute  force  axaalnatlon  of  all  search 
trees  consistent  with  tha  partial  ordar  Is,  In 
general,  not  a  feasible  approach  to  alnlalzlng 
tha  avaraga  nuabar  of  tasts.  For  soae 
particular  sats  of  probabilities  assigned  to 
tha  components  there  are  known  ways  to 
restrict  the  search.  A  nuaber  of  these  are 
given  by  Gilbert  and  Moore  In  the  llnaar  order 
case.  Ona  aathod  for  partial  orders  Is 
provided  In 


of  tha  fora  AC  or  BC  by  (A+B)C 
replacing  links  of  tha  fora  CA  or  CB^JV 
links.  Mawly  redundant  links  are  allalnatm. 
One  new  Hassa  dlagraa  Is  created  for  ucb  link 
In  tha  original.  The  collection  of 
^raas  consistent  with  these  dlagraas  with 
coaq>oslta  nodas  aay  Include  s«^  %d»ich  are  no 
lor^r  consistent  with  the  original  dlagrm. 
These  can  be  Identified  easily,  since  they 
contain  upper  sats  idilch  do  not  belong  to  the 
original  partial  ordar. 

aliainata  Interaadiata  Haase  dlagraas  which 
^n'troduca  uppar  sats  not  found  in  tha  origin^ 
Hassa  dlagraa.  It  is  thus  clear  that  by 
construction  the  set  of  search  trees  ov« 
which  wa  now  optiaiza  is  exactly  the  sat 
consistent  with  the  original  partial  ordar. 

Me  can  repeat  tha  process  for  the 
consistent  interaadiata  Hasse  dlagraas  irtiose 
solutions  are  not  yet  available  to  us. 
all  Interaadiata  probleas  are  solved  and  their 
associated  expected  nuabar  of  tests 

detaralnad,  the  alnlaua  a:q>acted  nuater  of 
tMts  solution  is  tha  solution  to  the  original 
partial  ordar  problaa. 

Thera  are  two  siapla  types  of 
Interaadiata  problaa  whose  solutions  are 
available  to  us.  These  are  linear  orders  and 
"V  -  orders". 

rw»#<ni»-toni  A  partial  order  is  a  Y  ~  OliK  i* 
^  the  mion  of  two  linear  orders  which  have 
precisely  their  alnlaal  aleaant  in  coaaon. 


2:  A  V  -  order.  A,  > 


>  A 


optiml*  saar^  tree  as  the  linear  ordar.  A,  > 
.  >  ...  >  Ai«  >  A,  >  A.-x  >  •••  >  A*  >  Ax. 


>  Aj.x  >  Ax,  has  tha 


Thaoraa  l:  In  an  optlaal  search  tree,  at 

least  ona  noda  of  alnlaua  probability  aust 
fora  a  sibling  pair  with  a  node  to  which  it  Is 
linked  in  tha  Hassa  dlagraa. 

Nevertheless,  we  require  a  acre  powerful 
approach  In  ordar  to  aaka  progress  on  tha 
general  partial  ordar  search  problaa.  Me  can 
systaaatlcally  dacom>osa  any  partially  ordarad 
problaa  Into  a  sat  of  linearly  ordered 
probleas  froa  which  tha  original  can  be  solvad 
indirectly.  Me  use  tha  Hu-Tuckar  algorltha  on 
each  llnaar  problaa.  Tha  llnaar  pro'  teas 
solvad  by  the  Hu-Tuckar  algorltha  o-  '  -a 
coaposlta  nodes,  corresponding  to  'ict,  of 
sibling  pairs  In  tha  search  traa,  and 
mat  ba  expanded  back  out  In  avalut.«.>  9  ^e 
avaraga  nuabar  of  tasts  required  oy  the 
particular  candidate  solution.  The  final  traa 
Is  tha  alnlaua  ovar  tha  several  Hu-Tuckar 
solutions. 

More  specifically,  each  link,  AB,  in  tha 
Hassa  dlagraa,  consider  tha  Hassa  dlagraa 
foraed  by  substituting  tha  coaposlta  node  A-^B 
(with  probability  glvan  by  tha  sun  of  and  P( 
in  obvious  notation)  for  A  and  replacing  links 


TO  suaaariza,  tha  proposed  apprM^ 
ronsists  of  tha  following.  For  each  “ 

2ia  Hassa  dlagraa  fora  a  new  dla^m 
ioaposlta  noda  whose  new  probability  la  the 
ix^Tof  tha  previous  noda  probabilities.  If 
law  upper  sats  not  present  in  tha  original 
lassa  dlagraa  have  bean  created,  do  not 
:onsldar  tha  new  dlagraa  further.  K  "WY 
llaqran  Is  a  linear  ordar  or  a  V  -  ordar,  find 
Its  optlaal  tast  traa  by  tha  Hu-^ckar 
ilaorltha.  Split  the  coaposlta  noda  into  a 
aalr  of  sibling  nodas  in  the  final  test  teaa, 
ind  calculate  tha  cost  of  the  tree.  If 
new  dlagraa  cannot  ba  solvad  lan^lataly, 
baoin  tha  process  of  foraing  coaposlta  nodes 
^In  with  that  dlagraa.  The 
tree  for  tha  original  problaa  is  tha 
cost  solution  froa  aaong  these  candidate 
trees. 
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Summary 

When  variable  length  codei  inch  ai  Hufiman  code*  are 
transmitted  through  a  noisy  channel,  any  bit  error  can  lead 
to  loss  of  synchronisation  at  the  decoder  and  cause  error 
propagation.  Synchronising  codesrords  (SCs)  have  been 
previously  proposed  to  resynchronise  the  decoder  regard¬ 
less  of  any  preceding  synchronisation  slippage.  However, 
SCs  retsun  one  significant  disadvantage.  Although  the  de¬ 
coder  will  be  synchronised  after  decoding  a  SC,  the  decoded 
symbols  after  the  SC  may  be  shifted  unce  the  number  of 
decoded  symbob  before  the  SC  may  be  different  from  the 
original  number  (due  to  decoding  a  variable-length  code  in 
the  presence  of  errors).  Abo,  even  though  the  decoder  will 
be  synchronised  after  decoding  a  synchronbing  codeword, 
the  decoder  may  not  realise  it  was  ever  out  of  synchronisa¬ 
tion. 

To  overcome  the  drawbacks  of  synchronising  codewords, 
we  introduce  the  concept  of  ettended  tynchronixing  code- 
wnda  (ESCs).  ESCs  can  guarantee  both  codeword  and 
symbol  synchronisation,  so  that  the  symbob  after  the  ESC 
will  be  decoded  correctly  and  will  be  put  in  their  correct 
positions.  Thus,  ESCs  can  be  used  as  markers  in  the  bit 
stream  to  prevent  propagation  of  both  decoding  errors  and 
symbol  shift  errors.  An  application  of  this  to  image  cod¬ 
ing  has  been  studied  in  [3].  We  give  a  formal  definition  of 
ESCs  and  derive  some  bounds  on  the  amount  of  overhead 
necessary  in  designing  a  binary  prefix  code  with  an  ESC, 
and  study  relationships  between  ESCs  and  SCs. 

In  this  paper,  we  consider  only  binary  prefix  codes.  For  a 
source  S  with  symbob  denote  the  probability 

distribution  of  the  symbob  by  (pi ,  •  ■  • ,  p,t),  and  assume  pi  > 
Pi.). j .  A  code  C  =  {ci ,  ■  ■  • ,  Cn}  is  associated  with  the  source 
5  if  Ci  is  assigned  to  the  symbol  Si.  The  length  of  a  is 
denoted  by  fi,  and  the  average  codeword  length  of  (7  b 
denoted  by  E(C)  =  5]"_i  pifj. 

Our  notion  of  an  ESC  b  formally  defined  as  follows.  A 
codeword  c,j  of  a  prefix  code  C  is  defined  to  be  an  EiSC  if  it 
satisfies  the  following  conditions:  (1)  for  all  a  ^  c,„  if  c,«  = 
a/3  and  a  is  a  suffix  of  some  codeword  of  C,  then  =  yd, 
where  y  is  empty  or  a  sequence  of  codewords,  and  f  is  not 

"This  work  wis  supported  in  port  by  s  peat  {ram  Siemeas  Corpo¬ 
rate  Research  of  Princeton,  NJ. 


empty  and  b  not  a  prefix  of  any  codeword,  and  (2)  Cu  b  not 
a  substring  of  any  other  codeword.  These  two  conditions 
guarantee  that  regardless  of  prior  slippage  the  decoder  will 
not  decode  the  ESC  as  parts  of  other  codewords  (as  long 
as  there  b  no  error  in  the  ESC  itself).  Hence,  the  ^C  will 
be  correctly  decoded,  so  that  the  decoder  knows  that  there 
was  an  error  and  can  resynchroniie  after  the  ESC. 

The  following  bounds  on  the  amount  of  overhead  needed 
in  designing  a  code  with  an  ESC  can  be  obtained.  If  a 
source  5  b  designed  to  have  a  prefix  code  C  that  has  at  least 
one  ESC,  then  E{C)  >  E{B)  where  S  b  the  Huffinan  code 
for  S.  On  the  other  hand,  there  exbts  a  prefix  code  C  with 
an  ESC  of  probability  pw,  and  E(C)  =  E(H)  -f-  pwln-  Abo, 
if  the  depth  of  a  maximal  complete  subtree  in  a  Huffman 
code  b  d,  then  there  erists  a  prefix  code  C  with  an  ESC 
of  probability  pw,  and  E(C)  =  E(H)  +  Pn{d  +  1).  Using 
results  from  [2]  and  [1]  it  follovrs  that  a  source  S  admits  a 
prefix  code  C  with  an  ESC  of  probability  p„  and  £(C)  = 
E{H)  But  we  can  show  that  for  all  n  >  6,  pw(d  -I- 1) 
b  a  tighter  bound  than  |  (the  two  bounds  are  equal  only 
when  n  =  6  and  n  =  8).  In  fact,  pn(d  -t- 1)  -*  0  as  n  oo. 

Finally,  some  relationships  between  ESCs  and  SCs  can 
be  obtained.  For  example,  if  a  code  C  has  a  SC  cx,  then 
CACi  b  an  ESC  for  C',  where  C'  has  the  same  codewords 
except  that  C'  leaves  the  codeword  Cj  unused.  Also,  if  a 
source  S  can  be  designed  to  have  a  SC  cx  (probability  px) 
with  no  overhead,  then  S  admits  a  code  C  with  an  ESC 
and  E(C)  =  Pi  +  Px(li  +  1),  where  t  can  be  any  codeword 
index. 
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We  consider  an  IID  source  that  generates  binary  data  with 
a  distribution  P  on  {0,  1}.  II(P)  denotes  the  entropy  function  for 
P.  A  denotes  an  acceptable  distortion  that  is  a  real  constant  in  [0, 

1] .  W  denotes  a  conditional  probalriity  on  fO,  l}x{0, 1}.  R{P,  A) 
denotes  the  rate  function  for  P  and  A  with  Hamming  distortion 
d  that  is  measured  by  the  norm2Jized  Hamming  distance. 

The  proposed  code  system  [1]  encodes  a  binary  message  of 
variable  len^h  (VL)  into  the  pair  of  1)  a  binary  VL  codeword  and 

2)  a  binary  VL  sequence  called  a  side  information.  A  sequence 
concatenated  codewords  and  another  s^uence  concatenated  side 
informations  can  be  mixed  in  time  sharing  and  be  transmitted  to 
a  decoder  as  a  single  binary  sequence. 

Encoder 

We  suppose  that  P  and  W,  where  lV(l|0)+lV(0|l)>l/2  so 
that  m>2  may  be  guaranteed  in  (*),  are  given  beforehand.  A 
counter  c  stores  the  number  of  the  source  data  alre2Miy  processed 
by  the  encoder.  Each  of  variables  A  and  C  can  assume  reals  in  [0, 
ij.  The  encoder,  after  initialization 
c  -  1,  A  -  1,  C  ^  0, 
begins  the  following  Recursion. 

Recursion:  The  encoder  puts  a  source  data  z^  in.  When 
Xc=0,  one  of  the  following  three  cases  oq,  fio  and  tq  can  occur. 

In  case  cro  of  [C,  C-HA)C[0,  IV(0|0)),  if  [C,  C-(-A)g[0, 
W(0|1)),  then  the  processing,  after  update 

c  ^  c+1,  A  ^  AxPW(0)/W(0|0),  C  r-  CxPW(0)/W(0|0), 
reenters  into  Recursion.  Otherwise,  the  processing  esc^es  from 
Recursion  and  enters  into  Termination. 

In  case  ^ol[C,  C+A)C[W(0|0),  1),  the  processing  escapes 
from  Recursion  and  enters  into  Termination. 

The  third  case  m  is  [C,  C4-A)3W(0\Q],  which  includes  Oq 
and  Ai  half  and  half.  In  this  case,  the  encoder  reduces  [C,  C4-A) 
into  either  one  of  subintervals  [C,  W(0|0))  and  [W(0|0),  C-l-A). 
Definitely,  the  encoder  selects  a  random  real  r  uniformly  in  [0, 
1).  (Transmission  of  r  to  the  decoder  side  is  unnecessary.)  If 
r<(W(0|0)  —  C)/A,  then  [C,  C+A)  is  reduced  into  the  first  subin¬ 
terval  [C,  IV(0|0))  as 

A-W(0|0)-C,  Ci-C, 

and  the  current  case  To  returns  to  Ofo-  Otherwise,  [C,  C-i-A)  is 
reduced  into  the  second  subinterval  [IV(0|0),  C-(-A)  as 
A  ^  C-t-A-Wf0|0),  C  ^  W(0|0), 
and  7o  returns  to 

These  cases  ao,  fia  %nd  'K)  can  occur  for  1^=0.  Alternatively, 
when  Zc=l.  symmetric  cases  Oj,  Pi  and  7i  can  occur  (abbreviated). 

Termination:  If  [C,  C-(-A)C[0,  W(0|1)),  then  c,  A  and  C 
are  updated  as 

c-c-H,  A  »- AxPW(0)/W(0|l),  C-CxFW(0)/W(0|l). 
Otherwise,  that  is,  if  (C,  C-i-A)C[W(0|0),  1),  then  c,  A  and  C  are 
updated  as 

c^c-H,A-AxPW(l)/W(l|0),C^l-(l-ClxFW(l)/W(l|0). 

Next,  finding  the  minimum  length  /  sucn  that 
[O.y, O.y  -H  2-')C[C, C-(-A)  where  O.y  =  TT^y, 
for  at  least  a  binary  sequence  y—yy  •  -yi,  the  encoder  puts  y  out. 
This  y  is  the  codeword  tor  a  message  x=Zi-  ■  -xi,,  where  k  denotes 
the  final  value  of  c. 

The  encoder  also  puts  out  a  side  information  z  that  is  com¬ 
posed  of  i)  pog]  ( [lb /mj-fl)] -length  juxtaposition  of  Is  where 
m=  ri/(W(10)-(-W(0|l))T,  (.) 

ii)  a  0,  iii)  pogj  ([ib/mj+l)l-bit  integer  expression  of  the  value  of 
[ib/mj,  and  iv)  (log^ml-bit  integer  expression  of  the  value  o{  k  — 
mx  [ib/mJ. 

Decoder 

The  decoder  receives  from  the  encoder  a  concatenation  of 
codewords.  However,  no  boundary  is  there  between  the  first  code¬ 
word  y  and  the  second  codeword  y'.  Therefore  the  decoder  cannot 
but  reproduce  data  by  using  some  y-prefixed  sequence  ^  =  yy'  •  •  • 
instead  of  using  y  directly. 


The  decoder  also  receives  a  concatenation  of  ade  informa¬ 
tions.  The  first  side  information  z  is  separable  in  the  following 
way.  1)  Read  the  length  n-i-1  of  a  segment  1-  ■  -10.  2)  Regud  the 
n-length  sequence  (  prefixed  by  1-  •  -10  as  the  n-bit  integer  expres¬ 
sion  of  the  i^ue  of  [l:/mj,  where  m  is  given  by  (*).  3)  Regard  the 
Pogjm'l -length  sequence  prefixed  by  (  as  the  Pog2m]-bit  integer 
expression  of  the  vriue  of  k  —  mx  I k/mj. 

Now,  let  a  counter  c  store  the  number  of  the  data  already 
reproduced.  A  variable.Acanassumereakin[PW(l)/W(l|0)  —  1, 
PW(0)/W(0  1)),  and  another  variable  C,  in  [1  -  PW(1)/W(1|0), 
PW(0)/lV(0  1)).  The  decoder,  after  initialization 
c «—  0.  A  ♦—  1,  C  *—  0, 

retraces  the  Recursion  and  Terminaticm  of  encoder  in  the  following 
way. 

Retrace  of  Termination:  In  the  Termination  of  encoder, 
before  update  of  A  and  C,  either  [C,  C-l-A)C[0,  H''(0|1))  or  [C, 
C-f  A)C[W(0|0),  1)  is  true.  By  updating  A  and  C  in  the  for¬ 
mer  case,  it  follows  that  [C,  C-hA)C[0,  PW(0)),  that  is,  0.y6[0.y, 
0,y-|-2“')C[C,  C-bA)C[0,  PVV'(O)).  By  updating  A  and  C  in  the 
latter  case,  it  fdilows  that  [C,  C-fA)C[PW(0),  1),  that  is,  0.y6[0.y, 
0.y-f2“*)C[C,  C-bA)C[PW(0),  1).  These  cases  can  occur  alterna¬ 
tively  and  the  frilowing  rule  for  retrace  of  Termination  csm  dis- 
criimnate  them. 

If  0.y€[0,  PW(0)),  then  the  decoder,  after  update 
c^c-H,  A«-AxPW(0)/W'(0|l),  C^C, 
puts  a  data  2^=0  out.  Otherwise,  the  decoder,  after  update 
c^c-H,  A-AxPW(l)/W(110),  C^-bAx(l-PW(l)/W(l|0)), 
puts  2^=1  out. 

After  execution  of  either  one  of  the  above  (^rations,  the 
decoder  retraces  the  Recursion  of  encoder  in  the  following  way. 

Retrace  of  Recursion:  If  0.y€[C,  C-bAxPW(O)),  then 
the  decoder,  after  update 

c^c-H,  A^  AxPW(0)/W(0|0),  C^C, 
puts  a  data  2^=0  out.  Otherwise,  the  decoder,  after  update 
c^c-H,  A«-AxPlV(l)/H^(l|l),  C»-C-(-Ax(l-PW(l)/W’(l|l)), 
puts  Zi=l  out.  After  execution  of  either  one  of  the  abwe  opera¬ 
tions,  if  the  counter  c  is  less  than  k,  then  the  processing  reenters 
into  retrace  of  Recursion.  Otherwise,  the  processing  escapes  from 
retrace  of  Recursion. 

After  the  escape,  regard  the  reverted  sequence  x~*=z*. .  .zj 
as  the  reproduced  message  that  corresponds  to  the  original  mes¬ 
sage  x=Z|-  ■  -Xk-  Further,  y  is  reproducible  only  Horn  x/,. .  .Z]  with¬ 
out  using  the  random  numbers.  Hence  the  dKoder  can  separate 
uniquely  y  from  y,  and  can  proceed  to  processing  of  the  next  code¬ 
word. 


Compreflaion  Efficiency 

We  consider  an  arbitrary  W  achieving  the  minimum  mutual 
information  I(P,  W)  on  condition  that 
E(,.,)€{o,i)«{o.i)  P(^)W(y\x)d{x,  y) 

=  P(0)W(l|0)-bF(l)W(0|l)  <  A. 

If  A  is  small  so  that  the  condition  _ 


may  be  satisfied,  then  the  redundancy  p  defined  by 

(tke  expectstioB  of  ^  (  tke  expectstios  of  \ 

eodew^  leagtk  /  '*'  V  iMe  isfonnstioB  lesgtk  J 
(tke  expectstioi  of  mcHige  IcBgtk) 

is  bounded  as 

p  <  R(F,  A)  -»■  (W(1|0)  -b  W(0|1))  X 
(~  E»€(o.i)^(*)lF^(*l*)JogjW(*}airt  »)  f*®8af  wnio)+w(5}Tlll  ^ 

the  right  hand  side  of  which  is  close  to  R(P,  A)  for  small  A. 
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In  any  oommunication  network,  facility  failures  need  to  be  timely  detected 
so  that  necessary  protection  switching  functions  can  be  initiated  without  much 
delay  and  loss  of  customer  data.  Older  transmission  systems  such  as  T1 
trunks  base  this  decision  on  a  count  of  the  severely  errored  seconds  (SES). 
When  “many*  consecutive  SES’s  occur,  a  facility  failure  detection  scheme  corv 
eludes  that  the  Utre  is  not  reliable,  arKf  initiates  an  alarm,  which  in  turn  initiates 
a  changeover  of  the  transmission  link  to  spare  capacity.  For  transmission  sys¬ 
tems  that  employ  a  packet  format  with  total  or  partial  cyclic  redundancy  ch^ 
(CRC)  fields,  the  information  on  the  line  errors  b  available  at  the  receiver  after 
performing  a  CRC  check.  Such  transmission  systems  include  many  data  link 
teyer  trartsmbsion  protocob  such  as  HOLC,  SDLC,  LAPO,  and  LAPS,  or  the 
physical  byer  schemes  such  as  the  SONET  transmission  frame,  the  ATM  cell 
structure,  etc.  Ahematively,  the  corrupted  flags  of  a  data  link  byer  protocol 
can  be  used,  or  special  sampling  frames  can  be  transmitted  for  facility  rmni- 
toring  purposes.  For  exampb,  in  Signaling  System  No.  7  (SS7),  fill-in  sigrral 
units  (FISU’s)  are  transmitted  for  thb  purpose  over  the  signaling  network.  For 
many  transmission  functions,  in  partbubr  for  signaling,  thb  detection  should 
be  very  fast.  Existing  algorithms  for  thb  purpose  usually  involve  some  inte¬ 
gration  so  that  intermittent  SES's  or  bad  CRC’s  do  not  cause  alarms  when  the 
facility  b  healthy,  or  conversely,  intermittent  nort-severely  errored  secorxb  or 
good  CRC’s  do  not  prevent  a  changeover  when  the  facility  b  faulty.  UsuaHy. 
thb  filter  b  designed  with  deterministic  specifications,  without  any  stochastb 
modeling  of  the  source. 

The  deterministb  filter  designed  for  tracking  the  low-frequency  trends  in 
SES's  or  CRC’s  b  a  linear  and  time-invariant  system.  On  the  other  hand, 
the  undertyiitg  process  orte  would  like  to  estimate  b  a  nonstationary  or  tirtw- 
varying  stochastic  process.  A  linear  and  time-invariant  system  would  be 
inferior  to  an  adaptive  nonlinear  scheme  based  on  a  good  stochastic  model 
of  the  source,  designed  to  optimize  an  objective  performance  criterion.  Even 
if  its  parameters  are  not  ar^sted  adaptively,  a  stochastb  nonlinear  source 
model  can  greatly  improve  the  perfomwnce  of  the  faibre  detection  system. 
One  such  model  b  a  finite-state  model,  or  a  Markov  model.  It  b  possUe  to 
generate  detailed  Markov  modeb,  but  in  its  simplest  form,  the  Markov  model 
for  the  facility  consbts  of  a  'good  state*  and  a  tiad  sbte’  where  good  state 
refers  to  a  hmlthy  facility,  and  bad  state  refers  to  a  failed  facility.  Assoebted 
with  each  state  b  a  probabilKy  of  a  bad  checksum,  equal  to  p  for  the  good 
state,  arxJ  equal  to  f -q  for  the  bad  state.  The  system  b  in  go^  state  at  time 
k,  conditioned  that  it  was  in  good  state  at  time  k-1  with  probability  1-a,  and 
it  b  in  bad  state  at  time  k,  conditioned  that  it  was  in  bad  state  at  time  k-1 
with  probability  1.  Typically,  p  <  1,  q  <  1.  a  <  1.  Such  Markov  modeb, 
known  as  hidden  Markov  modeb,  are  used  in  many  fields  such  as  ecology, 
cryptoanalysb,  at>d  most  importantly,  speech  recognition. 

Fitting  a  time  series  of  good  and  bad  checksums,  i.e.,  a  sequence  of  Vs 
and  1  ’s  to  thb  model  can  be  achieved  vb  dynamb  programming  or  the  Viterbi 
algorithm,  provided  that  a  meaningful  performance  criterion  b  chosen.  The 
most  commonly  used  performance  criterion  for  thb  purpose  b  known  as  the 
maximum  likelihood  criterion.  Based  on  the  observation  of  a  good  (Q)  or 
bad  (B)  checksum,  the  Viterbi  algorithm  assoebted  with  the  model  above 
proceeds  as  follows. 

1.  Dq(0)  =  0. 00(0)  =  -oo.  *=1. 

2. 

0Q(k)=  DQ(k-1)+tog(1-a)+['^(^-P> 

Dg(k)  =  n>ax[DQ(k-1)-i-toga,DB(k-1J\-i- 

3.  If  Dg(k)  >  DQ(k),  inkble  a  changeover.  Ebe,  set  k  —  k>f ,  go  to  2. 

The  probabilities  partd  q,  and  the  cortdMbttal  probability  a  can  be  estimMed 
from  real  data,  obtainabto  from  transmission  statbtbs.  ARematively.  hidden 
Markov  model  training  techniques  can  be  used. 

The  performance  evaluation  for  a  failure  detection  scheme  should  be 
based  on  how  long  it  takes  to  dectare  a  Nrte  failed  after  an  actual  failure, 
as  wel  as  how  frequently  the  scheme  deebres  a  healthy  Ibe  failed.  We  wW 
call  the  first  of  thsse  qualities  the  detection  delay,  T^,  and  the  secorrd.  the 
probability  of  false  abum,  Pp^.  A  good  failure  detection  scheme  minimizes 
both  of  these  quantities. 


In  order  to  benchmark  the  performarxM  of  the  stochastb  model,  we  use 
the  conventiortal  leaky  integration  scheme  known  as  the  leaky  bucket.  In  thb 
scheme,  there  b  a  counter,  whbh  b  incremented  every  tkne  a  bad  checksum 
occurs,  and  decremented  whenever  the  count  b  positive  for  every  good 
checksums  received.  A  changeover  b  initiated  when  the  counter  reaches 
Tb  The  values  used  for  7^  and  Tg  lor  the  SS7  network  are  256  arxf  64, 
respectively. 

ModePng  the  temporal  behavior  of  a  channel  at  the  receiver  erxl  as  a 
Markov  source  b  quite  common.  With  thb  motivation,  we  simulated  the  dirta 
at  the  receiver  as  the  output  of  a  Markov  source  (different  than  the  one  used  for 
faibre  detection).  Picking  a  block  size  c4  B=  16,  we  assumed  the  source  to 
corrsbt  of  1 7  states,  each  roughly  corresportding  to  the  number  of  errors  in  the 
block,  from  0  to  16.  The  transition  from  one  state  to  its  neighbors  b  governed 
by  a  geometrb  rub,  the  farther  a  nei^iborb,  it  b  geometricaPy  harder  to  get  to 
it.  States  8-16  are  bad  states,  there  b  no  way  the  system  can  go  back  to  the 
good  states  0-7,  once  it  enters  one  of  the  bad  states.  For  the  observations, 
we  introduce  arrother  degree  of  randomness  into  the  model.  The  number  of 
errors  that  the  model  gerrerates  when  at  state  /  b  a  rarxlom  variabb  whose 
mean  b  i  and  whose  standard  devbtion  b  proportional  to  i  Thb  model  b 
rebtively  arbitrary,  however,  it  captures  the  important  features  of  faciPly  faibre. 
A  Markov  source  bcomnwnly  used  for  modeliirg  receiver  data.  Thebbeksize 
16  b  chosen  as  a  compronrbe  between  performance  arvl  complexity.  Thb 
size  b  sufficient  for  thb  scheme  to  have  better  performance  than  the  leaky 
bucket,  but  smaPer  block  sizes  may  have  even  better  performance.  We  would 
Pke  the  system  to  exivbit  an  absoibing  stale,  or  a  super-state,  arxf  we  pick 
states8-16forthbpurpose.  The  geometrical  rub  b  one  way  to  assign  higher 
transition  probabifilies  to  closer  neighbors,  other  possbiPtbs  exist,  but  it  b  not 
much  Hkeiy  that  thb  wiP  change  the  overaP  result.  Making  the  number  of  bad 
checksums  at  each  slate  rarxlom  introduces  an  extra  degree  of  raiKbmrtess 
so  that  the  faibre  detection  algorithm  does  not  simply  bam  to  keep  track  of 
the  number  of  errors  and  detemrine  the  state  of  the  model,  i.e.,  P  hides  the 
underiyirrg  model  from  the  observer.  By  design,  we  pick  the  mean  of  the 
errors  at  each  state  equal  to  the  state.  In  order  to  irxrrease  the  uncertainly 
at  the  larger  states,  we  abo  pick  the  standard  devbtion  ptoportiorral  to  the 
state.  Other  modeb  are  possfcb,  but  again,  thb  b  not  exp^ed  to  change 
the  overaP  resuP  sigrrPicarPly. 

Sknubtions  show  that  the  targe  values  ol  Tg  are  associated  wPh  smaP 
values  of  the  fabe  alarm  ptobabPPy,  arxf  conversely,  smaP  vabes  of  Tg  ate 
associated  wPh  smaP  values  of  the  detection  delay.  APhough  the  effed  b 
more  mkxrr,  smaP  values  of  7/  can  be  observed  to  be  associated  wPh  a  large 
probabiKly  of  fabe  alarm  probebppy,  and  targe  vabes  wPh  smaP  detection 
delay.  In  summary,  Pb  not  possbb  to  optimize  ePher  70  or  7j[^  for  minimizPrg 
both  Pf^  and  7^  The  solution  seems  to  be  reaching  a  compromise 
between  Pn  anoT^  Our  sktHilations  showed  that  a  vabe  of  70  =  64, 
and256<  ^  <  32  yields  the  best  solution,  in  fine  wPh  the  SS7stanctard.  On 
the  other  hand,  aP  the  hidden  Maikov  modeb  studied  substarPiaPy  outperform 
the  leaky  bucket,  achieving  both  Pp^  and  7^  results  significantly  better  than 
the  leaky  bucket. 
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Asymptotic  Non-stationary  Behavior  of  Statistical  Multiplexing 

with  Multiple  Types  of  Traffic 
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Abstract 

The  cell  loei  probability  is  a  major  performance  factor  in  de¬ 
signing  an  ATH  (asynchronous  transfer  mode)  network  for  broad¬ 
band  integrated  serrices  of  multi-media  communications.  The  dy¬ 
namic  behaTior  of  an  ATM  network  needs  to  be  well  understood 
because  of  its  extremely  high  speed  and  diversity  of  traffic.  We 
analyse  the  trawent  buffer  overflow  probability  of  a  statistical 
multiplexer  snth  multiple  types  of  traffic  by  taking  a  spectral  rep¬ 
resentation  approach.  The  joint  ffistribution  is  obUuned  in  the 
Laplace  transform  domain  in  analytic  form.  An  asymptotic  be¬ 
havior  is  characterised  by  simple  parameters  of  what  we  term  the 
"dominant*  type  traffic. 

Summary 


We  assume  that  thero  are  Jtf  types  of  sources,  and  the  traffic  of  type  m 
is  characterised  by  the  arrival  of  "bursts”  with  Poisson  rate  X„.  The 
burst  length  is  exponentially  distributed  with  mean  and  each  burst 
generates  cells  at  the  rate  of  [cells/tec].  The  output  link  capacity 
is  denoted  by  C  [cells/sec].  To  make  the  system  stable,  we  require 

EiUi  ^  <  c. 

bet  Jm(t)  be  the  number  of  type  m  burst  at  time  t.  The  aggre¬ 
gate  rate  of  cell  arrivals  at  the  multiplexer  is  then  pven  by  JZ(t)  = 
When  R(t)  exceeds  C  [cells/sec],  all  the  cells  cannot 
be  handled  immediately.  Let  Q(t)  denote  the  number  of  cells  outstand¬ 
ing  in  the  output  buffer,  and  define 


=  Prob{Jm(i)  =  i«.  1  <  m  <  Af;  and  Q(i)  <  *}.  (1) 

Let  P((,x)  be  the  column  vector  that  consists  of  all  the  i^(<,x).  fal¬ 
lowing  [1],  we  can  derive  a  matrix  differential  equation  for  P(t,x): 


(2) 


where 


M  =  Adi®  Ada® 

V  =  ■R<')®7l(*>®  -®7jW  -  C  l. 


Here  ®  end  0  represent  Kronecker  product  end  Kronecker  sum,  respec¬ 
tively,  end  /  is  the  identity  metrix  of  infinite  dimension.  And 


Afm 


~~\n  Pm  0 

Pm)  ^Pm 

0  “(^t»  +  ^Pm) 


R("*)  =  diag[0,il».  -.imffm,  -]. 


where  k,^  is  the  mth  element  of  an  integer  vector  k  =  [kt,---,kjir]: 
k*  =  0,1,2,- •  •  and  m=  1,2,  -  -.M. 

The  eigenvalue  u(s),  invcdved  in  the  above  equation,  is  given  by 
solving 


E 


■uC  -  s. 


(6) 


ff  we  denote  the  rigenvalue  for  the  integer  vector  k,  and  let  V^(s) 
and  U^{s)  be  the  corresponding  right  and  left  eigenvectors,  respectivdy, 
then  they  can  be  represented  as 


VkW  =  Vi.(a)®  -®V*„(s) 

Ifk(-)  =  (C-‘)*Vk(s) 


with 

If,'(s)DVk(a)  =  %  (7) 

and  C  is  a  diagonal  matrix  of  infinite  dimension  given  by 

C=  Cl®  -  ®  Cj#,  Cm  =  diag[l,.  -,y(^)i^,-  -l 

It  can  be  shown  from  [4]  that  the  number  of  the  positive  ogenvalues, 
denoted  by  (a)  and  derived  from  Eq.(O),  is  equal  to  the  number  of  k*s 
that  satisfy 

u 

'£R»kn,<C  (8) 

fesl 

Thus,  the  unknown  transient  boundary  condition  P*(s,0)  can  be  deter¬ 
mined  by  a  set  of  linear  constraint  equations 

(*))  +  I>P*(x,  0)]  =  0  (9) 

Since  the  dimension  of  this  matrix  equation  is  infinite,  we  should  be 
interested  in  the  most  dominant  (largest  negative)  eigenvalue  iysM(e), 
which  is  obtained  by  setting  k  =  0  in  Gq.(6).  This  dominant  term 
will  be  of  practical  importance  when  we  consider  an  asymptotic  buffer 
behavior,  i.e.,  when  x  is  large  enough.  The  tranrient  probaUlity  that 
the  buffer  content  Q{t)  exceeds  some  predetermined  buffer  capacity  B 
[ceUs]  is  approodmately  pven,  for  larger  B,  by 

G"(s,ff)  ^  Ce{Prch{Q{t)  >  ff}}  6(s)exp{ii*«(j)B}  (10) 

where  4(*)  =  -iri(*)[P*(0,  iM-,(s))  -h  I>P"(s,  0)]Vb(l:  «)• 

We  can  show  that  the  dominant  eigenvalue  u^om{*)  Ux*  between 
end  0  for  all  s  >  0. 


In  order  to  sol-ve  Eq.(2),  we  first  take  the  double  Laplace  transform 
(t,x)  «  (j,«)  on  P(t,x),  i.e.,  P(t,x)  *-*  P**(j,u),  and  use  P*(s,0) 
and  P*(0,u)  to  denote  the  Laplace  transforms  of  P(t,0)  and  P(0,x), 
respectively.  Equation  (2)  then  becomes 

P-(s,«)  =  (uD-fs/- A<)-‘(P*(0,«)-H>P*(s,0)l.  (3) 

Let  us  solve  the  eigenvalues  with  respect  to  u,  i.e.,  «l>V(s)  =  {M  - 
and  let  V'(s)  and  K(s;  s)  be  the  corresponding  right  eigenvector 
and  its  generating  function,  we  assume  that  V(s;  s)  can  be  decomposed 
as  V(«;  s)  =  nilf-i  •).  then  it  foUovrs 

i)> 

=  -e  +  «C-kE;j-iA«(*«-l).  (4) 

The  sdution  of  Eq.(4)  should  have  the  following  form: 

Vk,«(A»:»)  =  +  (5) 
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Abstract 

A  communication  network  with  time-varying  topology  is 
considered  and  the  region  of  achievable  throughputs  under  any 
transmission  control  strategy  is  characterized.  The  topology  of 
the  network  is  arbitrary.  The  topolo^cal  property  that  varies 
with  time  is  the  connectivity  and/or  the  capacity  of  the  links. 
An  underlying  network  state  process  is  considered  that  reflects 
the  physical  characteristics  of  the  network  that  affect  the  link 
transmission  capacity.  The  capacities  of  the  links  is  a  function  of 
the  state  process  which  has  Markovian  statistics.  The  transmis¬ 
sions  are  scheduled  dynamically  based  on  information  about  the 
link  capacities  and  the  backlog  in  the  network.  The  maximum 
achievable  throughput  is  characterized  and  a  scheduling  policy 
that  obtains  it  is  specified.  The  model  of  changing  topology  that 
is  considered  here  applies  to  TDMA  and  CDMA  networks  with 
mobile  users  and  networks  with  meteor-burst  communication 
channels. 

Summary 

The  network  model  consists  of  N  transmitters  and  M  re¬ 
ceivers.  Each  transmitter  may  transmit  to  every  receiver.  The 
transmission  of  transmitter  i  to  receiver  j  at  slot  t  is  successful 
with  some  probability  j(f  )•  A  network  with  arbitrary  topology 
can  be  represented  with  the  above  model.  The  transmitters  and 
the  receivers  correspond  to  the  network  nodes  and  the  connec¬ 
tivity  is  mapped  in  the  probabilities  of  successful  transmissions. 
If  there  is  no  communication  link  from  transmitter  t  to  recaver 
j  then  Qo(f)  =  0. 

The  time  varying  topology  is  represented  by  the  variation 
with  time  of  the  probabilities  of  successful  transmission  Qij{t). 
These  probabilities  depend  on  certain  physical  characteristics  of 
the  network  that  change  with  time.  In  addition  these  probabili¬ 
ties  depend  on  which  transmitters  attempt  transmission  towards 
which  receivers  at  each  slot.  The  physical  characteristics  of  the 
network  that  affect  the  probabilities  of  successful  transmission 
are  captured  by  the  underlying  network  state  variable  8(t)  which 
takes  values  in  the  set  5  =  In  the  case  of  a  network 

of  mobile  nodes,  the  imderlying  network  state  denotes  the  geo¬ 
graphical  position  of  the  nodes,  while  in  a  meteor-btirst  network 
the  state  denotes  the  existence  or  absence  of  meteor-bursts  for 
the  various  links.  Each  transmitter  at  any  slot  may  attempt  to 
transmit  to  one  of  the  receivers  or  to  idle.  The  transmission 
attempts  at  slot  t  are  denoted  by  the  binary  transmission  vec¬ 
tor  R(0  =  :  »  =  1,-,N  j  =  l,..,Af)  where  Rijit)  is 

equal  to  1  if  transmitter  t  attempts  transmission  to  receiver  j 
at  that  slot  and  0  otherwise.  Let  Qij  :  S  x  {0,1}“*^  -*  |0,1) 
be  the  function  that  determines  the  probability  of  success  in 
the  transmission  from  i  to  j  at  t  based  on  R(f),  a(<);  that  is 

If  the  number  of  packets  Xij{t)  in  transmitter  t  with  des¬ 
tination  the  receiver  j  at  the  end  of  slot  t  is  nonzero  then  a 
packet  is  transmitted  successfully  to  j  in  slot  t  -I- 1  with  prob¬ 
ability  Qij{t  +  1)  and  independently  of  the  past.  At  transmit¬ 
ter  i,  Ay(<)  pacl^s  are  generated  with  destination  the  receiver 
j  during  slot  t.  The  arrival  processes  are  Markov  modulated. 


The  transmission  vector  R(t)  is  determined  according  to  some 
transmission  scheduling  policy.  In  this  work  we  characterize  the 
throughput  region  of  the  time-varying  network.  That  is  the  set 
of  arri%^  rates  atj  =  E[Ay(t)],  :  =  j  =  for 

which  the  system  is  stable  under  some  scheduling  policy  where 
the  network  is  defined  to  be  stable  if  the  expectation  of  the  to¬ 
tal  number  of  packets  in  the  system  is  tiniformly  boimded.  The 
underlying  network  state  process  is  assumed  to  be  a  finite  state 
space  irreducible  Markov  chain.  The  probability  of  state  si  un¬ 
der  the  stationary  distribution  is  denoted  by  p'(si).  The  two 
main  results  are  the  following. 

Theorem  1:  The  necessary  and  sufficient  condition  for  a 
vector  a  =  (ajj  :  i  =  j  =  1,..,M)  to  belong  to  the 

throughput  region  of  the  system  is  that  there  exist  nonegative 
munbers  c/m,  I  =  m  =  such  that 

2NJ# 

e/ni  ^  —  1, ..,  L 

for  which  we  can  express  the  arrival  rate  vector  as 
L  2"“ 

«  =  »■"*) 

Issl  masl 

where  r™,  m  =  1, ..,  2^**  are  idl  the  binary  vectors  with  NM  el¬ 
ements  and  Q(sj,r"‘)  =  (Q^Cs/.r™) :  i  =  j  =  1,..,A/). 

Theorem  t:  The  policy  that  schedules  at  slot  t  the  trans¬ 
mission  vector 

N  M 

R(t)  =  arg  max  0o(5(<).  r)Xii(t) 

relO,!)""  ^  ^ 

stabilizes  the  network  under  the  necessary  and  sufficient  condi¬ 
tion  of  theorem  1. 

The  necessity  in  theorem  1  follows  from  the  fact  that  in 
stable  mode  the  long  time  uvexnge  number  cff  packets  success¬ 
fully  transmitted  equals  to  the  arrival  rate.  The  sufficiency  in 
theorem  1  is  proved  by  showing  that  imder  the  policy  of  the¬ 
orem  2  the  system  is  stable  when  the  condition  cff  themem  1 
holds.  The  state  of  the  system  is  represented  by  the  vecUv  of 
packet  backlogs  in  the  nodes  and  the  underlying  topology  state. 
First  it  is  shown  that  the  drift  of  a  quadratic  function  of  the 
backlog  when  it  is  averaged  by  the  underlying  topology  state 
stationary  distribution  is  negative.  Then  it  is  shown  that  if  for 
a  multidimensional  Markov  chain  with  infinite  and  finite  val¬ 
ued  components,  the  drift  of  a  Liapunov  function  of  the  infinite 
valued  components  is  negative  when  averaged  by  the  stationary 
distribution  of  the  finite  valued  components  then  the  chain  is 
ergodic. 
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1  Introduction 

Queueing  problem  is  investigated  for  a  very  wide  class  of  input  traffic 
models  and  a  good  loss  probability  approximation  is  obtained. 

We  first  consider  ATM  (Asynchronous  Transfer  Mode)  queueing, 
i.e.,  G/D/1,  and  then  extend  the  approximating  method  to  general 
queueing  problems.  The  only  essential  assumption  is  the  stationarity 
of  customer’s  arrival  process  and  the  service  process. 

2  Preliminaries 

We  first  consider  the  discrete  time  ATM  queueing  system.  Every 
cell  which  arrives  at  a  multiplexer  is  served  by  a  single  server  with 
constant  service  time  by  FIFO  (First-In-First-Out)  discipline.  The 
multiplexer  is  assumed  to  have  a  buffer  of  infinite  length.  The  unit  of 
time  is  taken  to  be  the  service  time  of  one  cell. 

Let  us  denote  by  at  the  number  of  cells  which  arrives  at  t  th  time 
slot,  and  Qt  the  queue  length  at  the  end  of  t  th  time  slot.  We  assume 
=  0  for  some  to-  The  queue  length  Qt  satisfies  the  well-known 
recursion  formula  Qt  =  max(Q(_i  -  1,0)  -(-  at  by  Lindley  [1].  By 
solving  this  recursion,  we  have  the  direct  expression  of  Qt  [1]: 

I 

Qt  =  max  (  flj  -  t).  (1) 

j=i-i 

Now,  we  assume  that  the  arriving  cell  number  {uf}  is  a  stationary 
process.  Let  the  initial  time  to  tend  to  -oo  and  denote  by  Q  the 
stationary  queue  length,  and  Wi  the  stationary  number  of  arriving  cells 
in  an  i-interval.  (Here,  we  call  an  interval  of  length  t  an  t-interval.) 
Thus,  by  (1),  the  queue  length  Q  is  written  as  Q  =  maxi>o(lV,+j  -  i), 
and  the  cell  loss  probability  P[Q  >  9]  is  given  by 

P[Q  >  ,]  =  P[inax(A.  -  i)  >  9].  (2) 


5  General  Input  and  Service  Time 

The  above  idea  can  be  extended  to  obtain  upper  bounds  in  the  cases 
where  the  cell  inter-arrival  time  distribution  is  specified  in  the  ATM 
queueing,  and  more  generally,  extended  to  G/G/1  queueing. 

5.1  Inter-arrival  Time  Distribution 

Let  w„  be  the  waiting  time  of  the  n  th  cell,  r„  the  arrival  time  of 
the  n  th  cell.  The  unit  of  time  is,  of  course,  taken  to  be  the  service 
time  of  the  server.  Denote  by  1„  =  r„  -  r„_i  the  inter-arrival  time 
between  the  n  th  and  n  —  1  th  cells.  Assume  that  the  random  variables 
{!„}  are  mutually  independent  and  identically  distributed.  Let  p(k)  = 
Prob(<„  =  it],  k  =  0, 1,---,  denote  the  probability  distribution  of  1,. 
and  9{z)  the  PGF  of  t„.  Lindley’s  recursion  ui„+i  =  max(0,u;„  + 
1  —  <n+i)  leads  to  the  direct  expression  w  =  sup,.>o  V„,  where  10  = 
iim„^oo  w„  and  V„  =  -  L+i)  =  n  -  X;;L,  n  >  0,  Vo  = 

0.  We  have  an  upper  bound  of  the  cell  loss  probability  as  P[w  > 
9]  ^  where  Q„  is  the  number  that  minimizes 

♦"(o-'),  a>l- 

5.2  G/G/1 

We  consider  an  extension  of  our  method  to  general  input  and  service 
time  distribution.  No  specific  stochastic  nature  of  the  input  traffic 
and  the  service  time  are  assumed  except  for  stationarity. 

Let  us  denote  by  Q,  Nt  and  Li  the  stationary  number  of  packets 
in  the  queue,  that  of  arriving  packets  during  an  i-interval  and  that 
of  packets  served  during  an  i-interval,  respectively.  F\irther,  denote 
by  *A?,(2)  and  4'/„(z)  the  PGF’s  of  Nt  and  Li,  respectively.  Then 
we  have  P[Q  >  9]  <  or,“^’'''’**N.(oi)*i.-i(“r’)’  where  Oi  is  the 

number  that  minimizes  a~f’''''^^'Ar,(o!)4'i,,_,(o"'),  a  >  1. 

Furthermore,  we  can  eliminate  the  independence  assumption  of 
the  input  traffic  and  the  service  time  if  the  joint  distribution  of  the 
input  and  the  service  time  is  given. 


In  general,  however,  it  is  difficult  to  calculate  the  exact  value  of 
the  right-hand  side  of  (2).  The  purpose  of  this  paper  is  to  gjve  a  good 
approximation  of  (2). 

3  Loss  Probability  Upper  Bounds 

It  is  readily  seen  from  (2)  that  the  following  inequality  holds. 
Lemma  1 

f’[«>9]<Z^[^>^>  +  9l-  (3) 

•  >1 


6  Heuristic  Modification  of  u{q) 


From  detailed  and  extensive  numerical  comparison  between  u(9)  and 
exact  formulas  or  simulation  of  loss  probability,  it  seems  that  log  P[Q  > 
9)  is  approximated  well  by  log  u(9)-|-constant.  Since  P[Q  >  0]  =  p,  we 
modify  tt(9)  to  define  5(9)  =  ~^’*(9)’  where  p  is  the  link  utilization. 
We  show  the  numerical  calculation  results  of  u(9)  and  5(9)  in  Fig¬ 


ures  1-4  to  compare  them  with  the  exact  loss  probability  or  computer 
simulation. 


Each  term  P[iVi  >  J  -t-  9]  in  the  right-hand  side  of  (3)  is  approxi¬ 
mated  with  the  aid  of  the  Chemoff  bound  technique. 

Lemma  2  (Chernoff  bound)  Let  A  be  a  random  variable  taking  on 
non-negative  integral  values  and  9{z)  the  probability  generating  func¬ 
tion  (PGF)  of  N.  Then  P[N  >  r]  <  a~''9(Q)  holds  for  any  integer  r 
and  any  real  number  o  >  1. 

By  applying  Lemma  2  to  (3),  we  have 
Theorem  1  Let  4'i(z)  be  the  PGF  of  the  A,,  «  =  1,2,-  •  •.  Then,  we 
have  P[Q  >  9I  <  “(9).  where  u(9)  =  Ei>i  a"*!  “i 

the  number  that  minimizes  a”*‘‘^’l4'i(a),  a  >  1, «  =  1,2,- 


4  Application  to  Several  Input  Models 


4.1  M/D/1 

The  PGF  ♦i(z)  in  Theorem  1  for  Poisson  traffic  of  rate  p  is  ♦((*)  = 
joi(i-i)^  i  =  1,2,-".  Hence,  we  have  the  upper  bound  0(9)  of  the 

M/D/1  queueing  system  as  u(9)  =  Ei>i(  -^*  y+fg>+t-pt  for  any  in- 

-  » -b  9 

teger  9  >  0. 

4.2  AR(1)/D/1 


Suppose  the  input  process  {at}  is  represented  approximately  as  Of  = 
P  +  fit  fi  =  ^i-i  +  *it  where  6  is  a  constant,  |6|  <  1,  and  {<1}  are 
i.i.d.  Gaussian  random  variables  such  that  f|  ~  A/’(0, o^).  We  have 
the  upper  bound  u(9)  for  AR(1)/D/1  as  u(9)  =  Ei>i 


where  aj  = 


62(1-6)2' 


■  26(1  -  6')).  (we  omit  details) 
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Performance  Analysis  for  Two  Manhattan  Street  Network 

Routing  Algorithms 


Zheng  Chen  and  Toby  Berger 
School  of  Electrical  Engineering 
Cornell  University 
Ithaca,  NY  14853,  U.S.A. 


The  MSN  is  a  two-connected,  regular  network  with 
unidirectional  links.  The  links  are  arranged  in  a  struc¬ 
ture  that  resembles  the  one-way  system  of  slr<K:ts  and 
avenues  in  midtown  Manhattan;  a  16-node  MSN  is 
shown  in  Figure  1.  It  has  the  same  number  of  con¬ 
nections  per  node  as  a  bidirectional  loop,  namely  two 
inputs  and  two  outputs.  The  node  numbered  (t,j)  be¬ 
longs  to  row  level  ring  t  and  column  level  ring  j. 


The  choice  of  the  routing  procedure  in  a  MSN 
strongly  impacts  performance.  In  particular,  the  short¬ 
est  path  routing  technique  minimizes  the  transmission 
capacity  used  by  each  packet;  if  the  load  is  balanced 
among  the  node  pairs,  it  may  be  effective  in  maximizing 
the  throughput  of  the  network.  However,  it  necessitates 
extra  time  for  a  routing  table  look-up  for  each  packet 
at  each  node.  For  high  speed  networks  (e.g.,  optical 
nets)  or  heavily  loaded  networks,  this  can  cause  long 
waiting  time  as  many  packets  queue  in  buffers  at  the 
nodes.  Also  the  routing  table  at  each  node  needs  to  be 
updated  when  network  irregularities  occur  because  of 
link  failures  or  network  expansion.  Maxemchuk’s  rout¬ 
ing  rules  need  to  decide  routes  (besides  checking  the 
message  headers)  at  each  node;  some  of  them  are  appli¬ 
cable  to  irregular  networks. 


We  consider  two  routing  algorithms  that  eliminate 
the  above  drawbacks-random  routing  an<l  a  hicrarclii- 
c:il  deflection  routing.  'I'hc  random  routing  algorithm, 
selects  the  output  link  of  every  packet  randomly  at  each 
node.  It  can  be  performed  fast  enough  to  copy  piickets 
using  high  speed  lines.  It  docs  not  rc<|nirc  any  knowl¬ 
edge  about  the  current  topology  of  nodes  and  totally 
eliminates  queuing  delay;  it  is  always  possible  to  switch 
the  input  packets  to  the  output  links  without  conflicts 
provided  the  links  have  the  same  speed.  Moreover,  no 
memory  space  is  needed  at  nodes  for  saving  routing  ta¬ 
bles.  For  the  random  routing  algorithm  we  give  the 
theoretical  steady  state  delay  and  throughput  analy¬ 
sis  for  MSNs  via  a  single  node  approximation  Markov 
Chain  model.  A  simple  iterative  formula  used  for  cal¬ 
culating  the  related  distributions  is  derived,  and  its  ac¬ 
curacy  is  verified  by  simulation.  Not  surprisingly,  the 
network  throughput  of  this  random  routing  is  quite  low 
compared  to  that  of  the  shortest  path  routing.  To  ad¬ 
dress  this  weakness  we  propose  a  hierarchical  deflection 
routing  procedure  that  retains  many  of  the  advstntages 
of  the  random  routing  (for  example,  simplicity  and  no 
need  for  topological  knowledge)  yet  achieves  an  efficient 
network  throughput.  We  analyze  approximate  analyt¬ 
ical  models  of  this  routing  algorithm  under  the  condi¬ 
tions  of  infinite  buffers,  finite  buffers  and  no  buffers  at 
each  node  and  give  a  simple,  iterative  formula  to  calcu¬ 
late  the  steady  state  performance  parameters.  Copious 
simulations  have  been  done,  and  the  results  match  well 
with  the  theory.  We  conclude  with  a  comparison  of  the 
performances  of  shortest  path,  hierarchical  and  random 
routing,  describing  their  individual  characteristics  and 
how  they  can  be  combined  for  enhanced  performance  in 
practical  applications. 

"'This  work  was  supported  in  part  by  Ni^F  grants  NCIl- 
8903288  and  IRI-9005849,  and  by  the  K  C.  Wong  Ed¬ 
ucation  Foundation  in  Hong  Kong. 
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SCHEDULING  TRANSMISSIONS  IN  A  MULTICAST  PACKET 
WHEN  CALL  SPLITTING  IS  ALLOWED 


by 

Chanitosh  Dixit’  and  Galen  Sasaki^  ® 


SWITCH 


Abstract:  Multicast  packet  switching  has  recently  received  con¬ 
siderable  attention  [1,  2,  4,  5].  A  multicast  packet  is  one  that  has 
a  set  of  destinations;  hence,  it  has  applications  in  multiparty  com¬ 
munication  (e.g.  voice  conference  calls,  video  conferencing,  and 
video  distribution  ).  The  switch  considered  has  N  inputs  and 
N  outputs,  as  shown  in  Figure  1,  and  transports  packets  at  the 
inputs  that  are  destined  for  the  outputs.  Time  is  assumed  to  be 
slotted  and  packets  at  inputs  are  transmitted  at  slot  boundaries. 
Packets  transmitted  in  a  given  slot  arrive  at  all  their  destinations 
in  the  same  slot.  The  switch  has  the  call  splitting  ability.  This 
means  that  a  multicast  packet  can  be  duplicated  at  an  input  so 
the  destinations  of  the  copies  form  a  partition  of  the  destinations 
of  the  original  packet.  The  copies  can  then  be  transmitted  at 
different  times.  The  call  splitting  operation  is  assumed  to  require 
negligible  time.  Call  splitting  allows  higher  throughput.  In  fact, 
it  has  been  shown  experimentally  that  it  provides  near  optimal 
throughput  [4].  We  consider  the  problem  of  finding  a  schedule 
of  transmissions  that  delivers  a  set  of  multicast  packets  (or  their 
copies)  to  their  destinations  in  the  minimum  number  of  time  slots. 


Summary:  The  scheduling  problem  will  be  defined  more  pre¬ 
cisely.  It  is  assumed  that  there  are  V  packets  initially  in  the 
switch.  These  packets  are  called  the  original  set.  A  set  of  packets 
resulting  from  call  splitting  none,  some,  or  all  of  the  original  set 
is  called  a  refinement.  Note  that  a  packet  r  that  has  been  call 
split  into  a  set  of  packets  ti,  ...,  has  the  property  that  the  des¬ 
tinations  of  T|,...,T„  form  a  partition  of  the  destinations  of  r.  If 
n  is  a  refinement  and  <r  is  a  mapping  from  11  to  {1,2,...}  then 
(n,<T)  is  called  a  schedule.  A  schedule  (11,  <r)  is  called  feasible  if 
for  each  €  11,  <t(xi)  =  <r{iri)  implies  that  packets  Xi  and  xj 
do  not  share  a  common  destination.  The  length  of  schedule  (11,  <r) 
is  max,gn  <^(’f)- 

Let  li  be  the  number  of  packets  in  the  original  set  at  input »; 
Oi  be  the  number  of  packets  in  the  original  set  destined  for  output 
i;  I,  :=  O,  :=  maX|<j<wO,;  and  t  :=  max{/.,0.}. 

Note  that  r  is  a  lower  bound  on  the  number  of  slots  required  to 
deliver  the  packets  to  their  destinations. 

Multicast  Scheduling  with  Call  Splitting  (MSCS)  Prob¬ 
lem.  Find  a  feasible  schedule  with  minimum  length. 

The  problem  can  be  shown  to  be  NP-Complete  by  modifying 
a  proof  used  in  [2]  for  another  NP-Complete  scheduling  problem. 
However,  for  a  restricted  class  of  instances,  the  problem  has  poly¬ 
nomial  time  complexity. 

Theorem  1.  Consider  the  MSCS  Problem  with  the  following 
additional  conditions:  each  input  has  either  (t)  at  most  one  mul¬ 
ticast  packet  or  (it)  a  set  of  unicast  packets  (i.e.,  packets  with  one 
destination).  Then  the  problem  has  time  complexity  0{N^)  and 
minimum  schedule  length  r. 

The  theorem  can  be  proven  by  transforming  the  problem  into 
a  polynomial  scheduling  problem  described  in  [3]. 
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The  MSCS  Problem  is  NP-Complete  but  our  simulations 
show  that  simple  scheduling  strategies  are  likely  to  find  sched¬ 
ule  lengths  of  r.  The  next  theorem  is  an  attempt  to  explain  this 
under  the  following  probabilistic  assumptions. 

Assumptions  1.  Each  packet  of  the  original  set  is  equally  likely 
to  be  at  one  of  the  inputs  and  these  locations  are  independent  of 
one  another.  A  packet  will  choose  an  output  as  one  of  its  desti¬ 
nations  with  probability  p  and  the  probability  is  independent  of 
other  outputs  and  packets.  Hence,  the  average  number  of  desti¬ 
nations  of  a  packet  is  pN. 

Theorem  2.  There  is  an  Algorithm  A  that  produces  a  feasible 
schedule  of  length  Tji  in  0{(V  -h  N)N^)  time  with  the  following 
property.  Suppose  Assumptions  1  are  true,  V  >  -^NlogN,  and 
P  >  where  c  >  1  and  t  e  (0,1)  are  constants.  Then 

[^  >  1  +  Un]  is  0(VN-%  where  (l  +  jf)- 

(Note  that  — ♦  (1  -I-  e)’  as  JV  — ♦  oo.) 

Algorithm  A.  First,  divide  the  V  packets  in  the  original  set  into 
I,  subsets,  called  bins.  The  size  of  bins  are  required  to  be  in 
and  each  bin  has  at  most  one  packet  for  each  input. 
Second,  find  a  minimum  length  schedule  for  each  bin.  Greedy 
scheduling  finds  the  minimum  length  schedule  because  there  are 
at  most  one  packet  per  input.  Finally,  concatenate  the  schedules 
together  to  form  the  final  schedule.  ■ 

The  proof  of  Theorem  2  has  two  parts.  First,  P[t  < 
pV(\  —  /,,Ar)]  is  shown  to  be  0  using  the  Chemoff  and 

union  bounds.  Finally,  P  [t^  >  pV  j  jg  shown  to  be 

0{VN~‘).  This  is  derived  by  showing  that  with  probability 
1  -  <  h  -  ^  <  and  with  probability 

1  —  0(A?^ ■'■*■’),  a  bin  has  schedule  length  at  noost  ii/^’yP(^  +  1)- 
The  last  two  probabilistic  results  can  be  derived  using  the  Cher- 
noff  bound,  the  union  bound,  and  Theorem  1. 
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Figure  1 
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In  this  paper  we  investigate  n-inlet,  n-outlet 
networks  having,  at  each  node,  a  two-inlet,  two-outlet 
(2  X  2)  switch  with  two  states 

^  S 

possessing  the  standard-path  property.  This  property 
is  the  following: 

given  any  inlet  i  and  outlet  j  of  the  network, 
there  is  a  fixed  path  through  the  network,  denoted 
i  -»  J  and  called  a  standard  path,  from  inlet  i  to 
outlet  j,  which  is  always  free  (carrying  no  signal 
from  any  other  inlet  to  any  other  outlet)  provided 
inlet  i  and  outlet  j  are  free,  and  existing 
connections  have  been  made  using  standard  paths.  We 
call  a  network  with  this  property  a  standard-path 
network  (SPN). 


An  SPN  is  therefore  a  special  kind  of  (wide-sense) 
non-blocking  network:  a  free  inlet  can  certainly  be 
connected  to  a  free  outlet  without  disturbing  existing 
connections.  However,  an  SPN  also  has  the  advantage 
that  a  signal  can  be  routed  through  the  network  without 
regard  to  the  network  state:  only  the  free  inlet  and 
free  outlet  to  be  connected  need  be  known.  The  signal 
can  therefore  be  self-routing. 

We  have  found  SPNs,  staged  SPNs  and  staged  planar 
SPNs  having  the  minimal  numbers  of  switches: 

THEOREM  1.  Any  n  x  n  standard-path  network  has  at 
least  n  -  switches.  This  number  is  best 

possible:  there  are  n  x  n  SPNs  with  n^  - 

switches  achieving  these  lower  bounds.  (See  Figure  for 
n  “  8) . 

THEOREM  2.  (see  also  [1])  Any  n  x  n  staged 

2 

standard-path  network  has  at  least  n  -n-1  (n  even) 

2 

and  n  -n  (n  odd)  2x2  switches.  Furthermore,  these 
lower  bounds  are  always  achieved. 


One  can  then  show  that  in  any  SPN  we  can  divide  the 
switches  of  the  network  into  five  types  (0,  +1,  2,  3) 
according  as  to  how  they  alter,  on  output,  the  labels 
on  their  inlets. 

The  design  of  minimal  staged  networks  can  be 
formulated  in  terms  of  a  game: 

The  match  game  and  minimal  staged  networks 
We  now  describe  a  solitaire  game,  played  on  a  grid 
2 

of  n  points  (i,j)  (i,  j  «  1,  2,  ...,  n),  with 

n(n-l)  matches.  Initially,  the  matches  are  all  in  a 
vertical  position,  i.e.  with  endpoints 
(i  -  1,,  2,  ...,  n; 

j  •  1,  ...,  n-1).  The  aim  is  to  position  all  the 
matches  horizontally  (i.e.  with  endpoints 

(i  -  1,  n-1;  j  -  1,  ...,  n)) 

using  as  few  moves  as  possible. 

The  allowable  moves  are  as  follows: 

1.  Remove  a  vertical  match,  or  add  a  horizontal 
match. 

2.  Rotate  a  vertical  match  through  90°  about  one  of 
its  endpoints. 

3.  Replace  an  adjacent  vertical  pair  of  matches  by  an 
adjacent  horizontal  pair  of  matches  with  the  same  four 
endpoints . 

These  stoves  may  only  be  used  in  positions  where  no 
"right  angle  of  matches"  (a  horizontal  match  and  a 
vertical  match  with  a  common  endpoint)  is  created. 

The  correspondence  between  this  game  and  the 
n  X  n  staged  network  is  as  follows:  The  point  (i,j) 
represents  the  label  i;j  ,  and  the  connected 
components  of  the  graph  defined  by  the  matches 
represent  the  current  label  sets.  Each  move 
corresponds  to  a  switch: 

move  1  corresponds  to  a  type  1  or  type  -1  switch; 

move  2  corresponds  to  a  type  2  switch; 

move  3  corresponds  to  a  type  3  switch. 

There  is  a  corresponding  game  using  coins  which 

corresponds  to  minimal  staged  planar  networks. 


THEOREM  3.  Any  n  x  n  planar  staged  standard-path 
network  has  at  least 


2x2  switches.  Furthermore,  these  numbers  M  of 

switches  are  achieved  for  n  »  2,  3,  6,  7,  10,  11,  ..., 
while  for  n  »  4,  5,  8,  9,  12,  13,  ...  ,  1  is 

achieved . 

Proofs  of  the  theorems  appear  in  [2], 

For  an  n  x  n  SPN  with  inlets  1,  2,  ...,  n  and 
outlets  1,  2,  ...,  n,  we  choose  a  typical  inlet  i, 
outlet  j  and  standard  path  i  -*  J  .  All  edges  of 
the  network  on  this  path  i  -•  j  will  be  given  the 
label  element  i; j.  The  label  of  an  edge  is  the  set 
of  all  such  label  elements  "*  j  th*  edge). 

For  any  integers  m,  t  with  1  <  m,  L  <  n,  let  m  or 

t  denote  some  m-  or  t-element  subset  of 
{1,  2,  ...,  n),  and  i:m  (respectively  i:J)  denote 
the  labels  {i:j|j€m)  (resp.  {i:j|i€i}).  Then 
LEMMA  Every  edge  label  in  an  SPN  is  either  of  the 

form  i:m  or  t:J  for  some  subsets  t  or  m  of 
{1,  2,  ...,  n). 


[1]  H.D.L.  Hollmann  and  J.H.  van  Lint  Jr,  "Nonblocking 
self-routing  switching  networks".  Discrete  Applied 
Mathesiatics.  vol.  37/38,  pp. 319-340,  1992. 

[2]  Liam  Halpenny  and  C.J.  Smyth,  "A  classification  of 
minimal  standard-path  2x2  switching  networks, 
Theoretical  Computer  Science,  vol.  102 

pp. 329-354,  1992.  ’ 


Figure.  A  minimal  8x8  SPN  with  52  switches. 
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BINARY  TREES  FOR  CLASSIFICATION,  REGRESSION,  AND  CLUSTERING, 
WITH  APPLICATIONS  TO  LOSSY  DATA  COMPRESSION 


Richard  A.  Olshen 
Division  of  Biostatistics 
Stanford  University  School  of  Medicine 
Stanford.  California  94305-5092 


ABSTRACT 

This  talk  is  a  survey  of  binary  tree-structured  methods 
for  classification,  regression,  survival  analysis,  and  clus¬ 
tering.  The  discussion  will  include  a  survey  of  unifying 
themes,  together  with  applications,  and  an  introduction 
to  mathematical  issues  that  arise  in  studying  their  asymp¬ 
totic  properties.  There  will  be  special  emphasis  on  the 
CART^^  algorithms  of  Breiman  et  al.,  and  on  applica¬ 
tions  of  the  clustering  algorithms  to  predictive,  pruned, 
tree-structured  vector  quantization  (predictive  PTSVQ). 
The  talk  is  a  summary  of  collaborations  with  many  au¬ 
thors  over  an  eighteen  year  period. 

Binary  tree-structured  statistical  methods  have  found 
wide  applicability  in  recent  years.  Areas  of  application 
have  included  computer-aided  diagnosis  and  prognosis  in 
medicine  ([1],  [5]);  speech  recognition,  ship  recognition  [1]; 
prediction  in  economics  and  finance;  the  sear^  for  pro¬ 
moter  regions  in  genetics;  particle  identification  in  physics; 
and,  perhaps  especially  for  this  audience,  lossy  data  com¬ 
pression  [3]  in  digital  radiography  [2].  There  are  predictors 
(features),  X,  and  a  response,  Y.  There  is  a  learning  sam¬ 
ple  C  =  {(Xi,y<)  :  i  =  1,  ...,n},  possibly  independent  and 
identically  distributed,  or  at  least  with  the  Y/s  condition¬ 
ally  independent  given  the  XJs  (see  Chapter  12  of  [1]). 
We  use  C  to  infer  an  unknown  future  V*  from  its  corre¬ 
sponding  known  X*.  In  some  cases  we  estimate  the  entire 
conditional  distribution  F(  |X*  =  i*)  of  V*,  even  when 
for  some  (Xi,  Vi)  pairs  in  C,  Vj  is  “censored.”  If  the  range 
of  y*  is  finite  and  the  goal  is  to  predict  its  value,  then  the 
problem  is  one  of  “classification”  (or  “discrimination”).  If 
y*  is  real,  the  problem  is  “regression.”  If  y*  is  real,  and 
the  goal  is  to  estimate  P{Y'  <  y|X*  =  x),  then  the  prob¬ 
lem  is  “survival  analysis.”  PTSVQ  can  be  viewed  as  an 
approtich  to  successive  2-means  clustering;  X*  =  Y*  is 
Euclidean,  and  we  want  to  predict  Y*;  but  the  complexity 
(bit  rate)  of  the  predictor  is  constrained. 

Algorithms  involve  successively  partitioning,  that  is  to 
say  “splitting,”  the  range  of  X  (the  “feature  space”).  At 
least  when  X  6  U*  the  partitioning  is  by  hyperplanes.  Re¬ 
sults  of  this  “recursive  partitioning”  can  be  summarized  by 
a  binary  tree;  X*  is  passed  from  the  root  node  successively 
to  a  terminal  node.  There  is  a  rule,  that  typically  is  con¬ 
stant  on  each  terminal  node,  by  which  Y*  is  predicted. 
The  rule  can  be  an  estimated  Bayes  rule,  as  in  classifica¬ 
tion,  or  a  centroid  of  members  of  the  terminal  node,  as  in 
PTSVQ.  Splitting  is  always  “greedy,”  one  node  at  a  time. 
In  order  to  obtain  some  possible  benefits  of  “lookahead,” 
which  these  algorithms  do  not  have,  we  grow  larger  trees 
than  we  intend  to  use  and  prune  them  ba»  [1]  on  the  basis 
of  a  validation  sample;  internal  validation  (typically  cross- 
validation);  or,  in  the  case  of  PTSVQ,  a  bit  rate  constraint 

([3],  [2]). 

Versions  of  these  algorithms  can  be  shown  to  be  “con¬ 
sistent”  in  various  senses;  Bayes  risk  consistent  for  clas¬ 
sification,  and  almost  surely  consistent  for  regression, 
clustering,  and  survival  analysis.  See,  for  example,  [1],  [4j, 
and  [6]. 


The  talk  will  include  a  report  on  applications  of  PTSVQ 
to  problems  of  data  compression  in  digital  radiography. 
The  studies  were  undertaken  by  a  group  of  engineers,  ra¬ 
diologists,  and  statisticians  at  Stanford  University  (see  [2]). 
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AbAnet  -  In  this  paper,  a  private-key  encryption  tecnique  it  propooed. 
wiikh  mokea  use  of  binary  laiear  blodc  bunt-enor-ootrectiqg  cades  and  is 
bated  on  the  fiat  that  such  codes  hoive  on  enor  eootiol  capacity  refated  to 
bints  which  is,  in  general,  fatger  than  its  random  enar  oontiol 
capacity. 


Encryption  techniques  based  on  algebraic  codes  have  been  proposed 
for  bodi  ptibUc  and  private-key  cryptosyatema  Specifically.  McEtiece 
[1]  intioduced  a  piiblic-keyciyptosystm  bated  on  t-enor  correcting  Qoppa 
codes  and  in  die  &ct  dirt  efficient  decodntg  algoridiins  eotist  for  suA 
codes  while  that  is  not  true  for  a  linear  code  in  general  To  be  effective, 
the  system  needs  a  large  blocklertgdi  code  (n~ltP  )  which  is  capable  of 
correcting  a  brge  number  of  landom  errors  (t~40).  Besides  that, 
McElieceb  scheme  results  ai  a  substantial  data  eatpansing  More  lecerdfy. 
Rao  and  Nam  (2],  {3],  ntioduced  a  private-key  encryption  similar  to 
McEliece's,  the  difference  being  the  fari  that  the  code  generator  matrix 
was  kept  secret  This  allowed  the  use  of  simpler  codes,  while  keepii^  the 
secwity  level 

In  this  paper  a  private-key  cryptosystem  is  introduced,  vdiich  makes 
use  of  binary  linear  block  biast-enor-coirectii^  codes.  The  main  goal  is  to 
ooaiatruct  a  secure  aystem  which  employs  simple  error  ccntiol  codes,  based 
on  the  fact  that  the  burst-correcting  capacity  of  a  code  is,  n  general 
larger  than  its  random  error  correcting  capacity. 

hi  whst  follows.  B<nji,d,b)  denotes  a  binaiy  linear  block  biast- 
correctir^  code  with  lerigdi  n.  danension  k.  nuiinun  Hommoig  distaiKe  d, 
capable  of  correctaig  bints  of  lengdi  up  to  b.  By  a  bust  of  krtp^  b,  it  is 
meant  a  binaiy  error  vector  of  lei^th  n  whose  nonzero  conqxaienis  are 
confined  to  b  consecutive  positions,  the  first  and  last  of  which  are  nonzera 
It  is  aasianed  that  b>t,  where d  =  2t+  1. 

Denotaig  by  G  the  generator  matrix  of  B<iijc.d.b),  the  syAem 
desagper  chooses  on  n  x  n  pemadstiao  matrix  P.  From  theses,  the 
encipherai^  and  deciphering  operations  related  to  the  plamteect  M  proceed  as 
foik^ : 

i-  Encipheriiig 


C  =  (MG  +  E^,)P 

wheteEi,  denotes  a  burst  of  kngdi  b  and  Hamming  weight  >  t .  landomly 
generated  A  the  trsnsBMtter.  M  sixl  C  (the  cipheitegct)  are  Unary  vectois  of 
length  kand  n.  respectively. 

ii  -  Decipberng 

Aep  I  -  (}oinpine 


C=CPT 


where  P''*  is  the  transpose  of  P,  to  obtain 

C=MG  +  Eb 

Aep 2  -  Decode  CT,  i  a,rtnto>veE|,  viasomedecodaigalgoridnn. 
That  recovers  the  pioantSK  M. 


Considerhig  thA  the  system  saapfaitimtAinn  is  smpfe.  it  is 
neceesHiy  to  focus  on  questions  refated  tj  its  security.  Appaientty,  the 
cryptmaliA  has.  A  his  dispoaal  A  lesA  two  mran  spprosches  to  rttsadc  the 
system,  namely 

i)  To  find  the  motricee  G  and  P  fiom  known  pons  (M.C). 

ii)  To  recovA  M ,  after  aderceptaig  C  ,  ftom  dioaen  pads  (M.Q. 

The  iBifeasfoility  of  (t)  via  any  exhauAive  search  type  of  proceihae 
is  cleA.  beconae  of  the  farge  nmnbA  of  choicea  for  the  niAricea  G  and  P. 
Besides  tfaoi  the  known  pfaurdeoEt  attack  is  also  dUHcnk  to  imidenMid.  saca 
the  sokition  of  die  equations  refated  to  die  ookann  vectois  of  Q  (or  G  °  (S’) 
requites  a  farge  nmnbA  of  known  pairs  and  das  may  be  prevented  by  tanely 
chinget  on  the  keys  used. 

To  recovA  M  fiom  C,  as  luggested  n  (ii).  means,  firsl  to  fmd  G' 
fiom  a  snflicteat  nmidiA  of  chosen  pads  and.  seoortd,  consideratg 

M=M,  ...M, 

C  =  C, . c. 

.  \ 

said 

G  =  [gijl  .  1  SiSk,  1  SjSn 
to  solve,  for  M.  the  syAun 

C)  =  migii  +  ...  +  m^gki  +  Cb, 

C2  =  m2gi2  +  •••  +  n>kgk2  +  ®bn 
Cn  =  mi  gin  +  ...+  mkgkn  +  eb„ 

The  podd  hAe  is  not  only  diA  the  solution  of  the  above  tyston 
requdes  a  oongwtational  oomplesuty  of  the  ordA  of  k  ,  tad  abo,  said 
perhaps  more  dnpoititd,  the  foot  dud  fdiddig  this  sohdion  fa  eqnivaleid 
to  decode,  usdig  at-eiior-oorrectdigoode,  a  received  vector  oorrupted  by 
an  error  vector  of  weight  gteatA  than  t  Therefore,  the  syAem  aecurdy  reUea 
not  only  on  the  difiknlty  of  deooddig  a  general  lineA  code,  as  di  the 
McEIieM  scfacDM,  but  also  on  the  difficulty  of  cotrecting  a  manbA  of 
errors  which  is  beyond  the  error-cotTectdig  capacity  of  a  given  code 


This  work  received  partial  support  ftom  the  Braztlfan  Science  and 
ReseAch  Oamcil  -  CNPq  and  Banco  do  Brasil 
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Universal  classes  of  hash  functions  were  introduced  by 
Carter  and  Wegmam  [1]  and  were  studied  further  by  Sar- 
wate  [2],  Wegman  and  Carter  [8]  and  recently  by  Stinson 
[6],  [7].  Stinson  has  found  the  coimections  between  combi¬ 
natorial  designs  and  universal  hashing.  He  has  proved  new 
lower  bounds  on  the  size  of  universa]2  classes  of  hash  func¬ 
tions. 

In  this  paper  we  study  universal^  classes  of  hash  functions  for 
<  >  2.  The  case  t  =  2  has  been  investigated  in  (1],  (2],  [6],  [7], 
[8].  We  present  some  characterizations  of  universalt  classes 
of  hash  functions  in  term  of  combinatorial  designs  and  or¬ 
thogonal  arrays  and  the  application  of  universalt  classes  of 
hash  functions  to  the  construction  of  authentication  codes. 

Let  A  and  B  be  finite  sets,  where  |A|  >  |fi|.  A  function 
h  :  A  —*  B  will  be  called  a  hash  function.  Let  h  be  a  hash 
function  and  let  t  >  2  be  an  integer.  For  a  set  of  t  pairwise 
distinct  elements  €  A,  define  Si,(xi,...,Xt)  =  1 

if  h{xi)  =  ...  =  h{xt),  and  Skixi,. . .  ,Xi)  =  0  otherwise. 
For  a  finite  set  H  of  hash  functions,  define  ^h(*i  i  •  •  •  i  *i)  = 
^*(*1 1  •  •  •  >  *«)•  definitions  of  classes 

of  hash  functions. 

1.  Let  e  be  a  positive  real  number.  H  is  t-almost  tinioersoli 
(or  i^AUt)  if  Sh{xi  ,  •  •  • ,  *i)  ^  <1-^1  for  *  pairwise  distinct 
elements  Z] , . . . ,  Z(  €  A. 

2.  Let  e  be  a  positive  real  number.  H  is  (-almost  strongly- 
universalt  (or  e  —  ASUt)  if  the  following  two  conditions  are 
satisfied: 

(a)  for  every  x  €  A  and  for  every  y  &  B,  \{h  €  H  :  h{x)  = 

y}|  =  \mB\. 

(b)  for  every  set  of  t  pairwise  distinct  elements  zj , . . . , X|  € 
A,  and  for  every  yi , . . . ,  yi  €  B, 

=  =  WIBl-*. 

First,  we  state  a  bound  on  6h{x\,-  ■  ■  ,xt),  that  is  a  general¬ 
ization  of  Theorem  1.1  [  ]. 

Theorem  1.  For  any  class  H  of  hash  functions  fix>m  A  to 
B  and  for  any  integer  t  >  2,  there  exist  t  pairwise  distinct 
elements  Zi , . . . ,  Z(  €  A,  such  that 

Mxi . 

where  a  =  [Aj  and  6  =  |B|  . 

Two  special  cases  of  our  definitions  are  optimally-universalf 
and  strongly-universnlf  classes  of  hash  functions,  which  are 
defined  as  follows: 

H  is  optimally-universali  (or  OUt)  if 

6„(xt,...,x,)  =  |B|6C/‘)/(;),  for  all  t  pairwise  distinct 
elements  Zi , . . . ,  Zt  €  A. 


H  is  strongly-universali  (or  SUt)  if  for  every  set  of  t  pairwise 
distinct  elements  z  I,...,  z  I  6  A  and  for  every  yi , . . . ,  yj  €  B, 
|{hgB  :  h(z,)  =  yi,...,h(z,)  =  y.}l  =  |B|/|B|‘. 

At  —  (u,  i.  A)  design  is  a  pair  where  is  a  set  of 

V  elements  (called  points)  and  B  is  a  family  of  h-subsets  of 
X  (called  blocks)  such  that  every  t-subset  of  X  is  contained 
in  exactly  A  blocks.  A  1  —  (t»,  t.  A)  design  is  resolvable  if 
the  blocks  can  be  partitioned  into  r  =  A(*”J)/(J~j)  parallel 
classes,  each  of  which  consists  of  v/k  blocks  that  partition 
the  set  of  points. 

An  orthogonal  array  OAx(t,  n,  k)  is  a  An*  x  k  array  of  n  sym¬ 
bols  such  that  every  set  of  1  columns  contains  every  ordered 
f-set  of  symbols  exactly  A  times. 

For  descriptions  of  unconditional  authentication  code^,  we 
refer  to  the  papers  of  Simmons  and  Stinson  (see  e.g.  [3], 
H).(5],(7J) 

Our  further  results  are  presented  in  the  following  theorems. 

Theorem  2.  If  there  exists  a  resolvable  t  —  (t>,  k,  A)  design, 
then  there  exists  an  OUi  class  H  of  hash  functions  £mm  A  to 
B.  where  |A|  =  u,  |B|  =  v/k  and  |B|  =  r  =  A(J:i)/(t:J). 
Conversely,  if  there  exists  an  OUt  class  H  of  hash  hinctions 
from  A  to  B,  where  a  =  |A|  and  b  =  |B|,  then  there  exists 
a  resolvable  t  —  (w,  k.  A)  design,  where  v=a,  k=a/b  and  A  = 

Theorem  3.  If  there  is  an  orthogonal  array  OAx(t,  n,  k), 
then  there  exists  an  SUf  class  H  of  hash  functions  tom  A 
to  B,  where  |A|  =  k,  |B|  =  n  and  |B|  =  An*.  Conversely,  if 
there  exists  an  SUt  class  H  of  hash  fiinctions  from  A  to  B, 
where  a  =  |A|  and  h  =  |B|,  then  there  exists  an  OAx(t,  n,  k), 
where  n=b,  k=a  and  A  =  |B|/n*. 

Theorem  4.  If  there  exists  an  e  —  ASUt  class  H  of  hash 
functions  tom  A  to  B,  then  there  exists  an  authentication 
code  for  |A|  source  states,  |B|  authenticators  and  |B|  encod¬ 
ing  rules,  such  that  Pdo  =  1/|B|  and  Pdj  <  t,  i  =  1, . . . ,  t. 

Using  combinatorial  designs  many  families  of  hash  functions 
in  the  above  theorems  have  been  constructed. 
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Abstract 

When  a  shadow  of  a  threshold  scheme  is  publicized,  new 
shadows  have  to  be  reconstructed  and  redistributed  in  order  to 
maintain  the  same  level  of  security.  In  this  paper  we  consider 
threshold  schemes  with  disenrollment  capabilities  where  the  new 
shadows  can  be  created  by  broadcasts  through  a  public  chan¬ 
nel.  We  establish  a  lower  bound  on  the  size  of  each  shadow  in  a 
scheme  that  allows  L  disenrollments.  We  exhibit  three  systems 
that  achieve  the  lower  bound  on  shadow  size. 

Summary 

In  safeguarding  a  secret,  there  are  many  situations  where 
two  or  more  guardians  provide  more  security  than  only  one.  Com¬ 
mon  examples  can  be  found  in  safe  deposit  boxes  and  in  the  con¬ 
trol  of  nuclear  weapons.  In  these  cases,  two  keys  are  needed  to 
activate  the  control  mechanism;  the  ability  to  exercise  shared  con¬ 
trol  is  lost  if  either  key  is  lost  or  either  key’s  owner  is  incapaci¬ 
tated.  To  guard  against  such  a  loss,  copies  of  keys  or  instructions 
may  be  made  and  distributed  to  different  parties.  However,  in¬ 
creasing  the  number  of  distributed  copies  increases  the  risk  of 
some  copy  being  compromised,  reducing  the  security  of  the  sys¬ 
tem.  By  distributing  “shadows”  of  a  shared  secret  (which  can  be 
used  as  a  key),  threshold  schemes  allow  shared  control  without 
risking  compromise  of  the  secret. 

A  (f,  n)  threshold  scheme  distributes  partially  redundant  shad¬ 
ows  5i,...,5„  among  n  users  so  that  any  t  or  more  shadows 
uniquely  determine  the  secret  A'.  Using  the  entropy  function 
H(X)  introduced  by  Shannon,  we  have  the  following  definitions. 

DEFINITION  1.  A  (t.n)  threshold  scheme  is  a  collection  of 
random  variables  (A',  Si , ...,  5„)  such  that  for  any  1  <  M  <  12  < 
...  <  ij  <  n, 

//(A'1S..,...,S,^)  =  0  Vj>t,  (1) 

W{A'|S . S,J>0  Vy<l.  (2) 

Condition  (1)  says  that  every  set  of  t  or  more  shadows  deter¬ 
mines  the  secret  uniquely,  whereas  condition  (2)  indicates  that  the 
secret  cannot  be  uniquely  determined  by  fewer  than  t  shadows. 

A  (<,n)  threshold  scheme  is  said  to  be  perfect  if 

f/(A'|.S., . S\)  =  H{1<)  yfj<t.  (3) 

Condition  (3)  says  that  knowledge  of  fewer  than  I  shadows  does 
not  reduce  one's  uncertainty  about  the  secret. 

The  disclosure  of  a  shadow  decreases  the  security  against  col¬ 
lusion  of  a  threshold  scheme  since  every  t  —  I  remaining  shadows, 
together  with  the  disclosed  shadow,  determine  the  secret.  Thus, 
the  threshold  is  reduc<-d  from  Mo  <  -  1.  In  order  to  maintain  the 
same  threshold  t,  the  key  must  be  changed  and  the  shadows  mod¬ 
ified.  One  way  to  do  this  is  to  design  a  new  (t,n)  scheme  where 
shadows  are  then  distributed  through  secure  channels.  The  se¬ 
curity  of  the  new  system  is  not  compromised  if  the  new  shadows 


are  independent  of  the  disclosed  shadow.  However,  setting  up  the 
secure  channels  for  distributing  shadows  can  be  expensive.  This 
paper  considers  schemes  which  distribute  modifications  to  exist¬ 
ing  shadows  through  insecure  channels.  Such  a  scheme  is  said  to 
have  a  disenrolling  capability. 

DEFINITION  2.  A  (t,n)  threshold  scheme  with  L-fold  disen¬ 
rollment  capability  is  a  collection  of  random  variables  (Kq,  Ki, ..., 
Ai,5i,..., 5„, F],..., Pi)  such  that  for  each  i,i  —  0,...,L, 

ff(K.IA,(k),Pi,...,P.)^0  Vk>t,  (4) 

H{K,lA,(k),P^ P.,S,,....5,)>0  Vfc<t,  (5) 

where  Ai(k)  =  {5„,...,5ij}  C  {S,+i,S,+2,-,5n}. 

In  order  to  minimize  the  cost  of  distributing  shadows  through 
secure  channels,  we  wish  to  minimize  the  number  of  bits  required 
to  encode  each  shadow.  It  is  conceivable  that  a  (<,ti)  thresh¬ 
old  scheme  with  higher  disenrollment  capability  requires  higher 
overhe^id  for  encoding  the  shadows.  We  show  that  this  is  indeed 
the  case  by  establishing  a  lower  bound  on  the  number  of  bits  re¬ 
quired  to  encode  a  shadow  that  grows  linearly  with  the  number 
L  of  disenrollments. 

THEOREM.  Let  {h'o,  K}, ...,  K  i,  Si, ...,  S„,  Pi, ...,  Pi)  beaper- 
feet  (t,n)  threshold  scheme  with  L-fold  disenrollment  capability. 
If  H(Ki)  =  m,  for  i  =  0, ...,  L,  then 

>  (L -y  l)m  Vj  =  l,...,n. 

We  consider  three  ex£imples  of  optimal  threshold  schemes 
with  L-foId  disenrollment  capability,  each  of  which  ewJiieves  the 
above  lower  bound.  The  Brickell-Stinson  scheme  [3]  makes  use 
of  one-time  pads,  the  nonrigid  hyperplane  scheme  [2]  is  based  on 
geometric  properties  of  hyperplanes  and  the  Martin  scheme  [5] 
employs  threshold  schemes  with  higher  thresholds. 
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Abstract:  Gunther  gave  an  algorithm  for  homofonic  coding 
of  messages  for  cryptographic  purpose  which  was  based  on 
known  source  statistics.  In  this  paper  we  ^ve  an  adaptive  ho¬ 
mofonic  algorithm  with  short  delay  for  a  discrete  memoryless 
source  with  unknown  statistics  based  on  Gunthers  algorithm. 
We  give  a  formula  for  the  individual  redundancy  as  well  as 
a  bound  for  the  max  redundancy.  Finally  a  comparison  with 
universal  source  coding  and  classical  ciphers  is  made. 

Summary —  The  unicity  distance  is  an  important  measure 
for  the  strength  of  a  cipher.  The  unicity  distance  depends  on 
the  redundancy  and  a  possible  strengthening  of  a  cryptoalgo¬ 
rithm  is  source  coding.  Another  approach  is  homofonic  coding. 

Let  M  denote  a  discrete  memoryless  source  emitting 
symbols  from  the  alphabet  M  of  L  =\  M  \  letters;  m"  = 
mi,m3,...,mn  a  sequence  of  n  letters  from  the  alphabet  M\ 
Af"  the  set  of  all  possible  m";  p(m)  the  probability  of  occur¬ 
rence  of  the  letter  m  €  Af  of  the  source  M  ;  p(m  |  m"“’) 
an  estimation  for  the  probability  of  the  letter  m  €  Af  given 
the  sequence  m"~*;  E  a  way  for  calculating  p(m  | 

Denote  by  ftfm"  |  E)  the  total  individual  redundancy  after 
coding  given  E,  when  the  sequence  m”  is  coded  by  the  pro¬ 
posed  algorithm;  Rn(m"  |  E)  the  per  letter  redundancy  for 
the  same  sequence;  similar  R{M  \  E)  respectively  RniM  \  E) 
the  ordinary  average  redundancy  for  a  source  M. 

Given  an  estimation  E,  the  following  algorithm  realizes  a 
letter  by  letter  adaptive  binary  homofonic  coding  for  M  ac¬ 
cording  to  Gunthers  algorithm  [1]. 

s;=0 

1)  t  :=  1,  s  :=  5  -H  1,  read  a  new  letter  m'  €  Af,  Vm  €  Af, 

calculate  p(m|m*”*),  and  let  |  m*~*)  :=  p(m|m’“'). 

2)  Ki  :=  f-log(maxmeM/'"‘Hm|m'~*)l. 


ing  a  sequence  m”  with  the  algorithm  above  given  E  is: 


tv  ■ 

•si 


p(tn. 

P(n»,)  ’ 


(1) 


The  theorem  states  that  the  redundancy  for  a  given  source 
only  depends  on  the  choice  of  E.  Hence  it  is  then  easy  to 
calculate  the  average  redundancy.  Next  we  select  an  adequate 
E.  According  to  results  from  Shtarkov  [3]  a  natural  choice  b 


p(m|m*  ’)  = 


<m(m-*)  +  l/2 
s  -  1  -H/2 


(2) 


where  b  the  number  of  occurrences  of  the  letter  m  in 

m*~‘.  Thb  choice  ensures  uniform  convergence  towards  zero 
max  redundancy  as  n  grows  towards  infinity  and  a  proof  for 
the  next  theorem  can  be  given. 


Theorem:  For  the  algorithm  the  following  inequality  holds 
for  the  maximum  redundancy  max  Rn{M  \  E)  over  all  possi¬ 
ble  sources  M  when  E  b  determined  by  (2): 

max  An(A<  I  log(n) -h  -  (3) 

where  c  b  a  positive  constant  that  b  small  compared  to 
log(n)  for  large  n. 

We  analyze  the  performance  of  the  given  algorithm  and 
compare  it  with  results  by  Davisson  [4]  for  binary  memoryless 
sources  and  the  optimal  homofonic  algorithm  [2].  The  algo¬ 
rithm  b  then  analyzed  according  to  Shannon  and  Heilmans 
theory  for  secrecy.  Examples  are  given  for  classical  stream  ci¬ 
phers  with  key  entropy  exceeding  log(n)  +  c,  the  minimal 
key  entropy  for  a  unicity  dbtance  greater  than  n. 


3)  Vm  €  Af.nli^  :=  l2'‘'p(-»(mlm*-’)J,  let  the 

(natural  chosen)  symbob  . 

represent  the  letter  m. 

4)  The  remaining  :=  2"'  -  symbols 

. €  2**'  are  chosen  as  prefix  symbols. 


5)  Choose  a  random  number  r  € 

6)  If  2'‘'r  <  transmit  i^d  go  to  1),  else 

7)  Vm  €  A/.^‘)(m|m*->)  :=  (2'‘'pf'-'’(m|m'-')  - 

Transmit  :=  i+  i,  go  to  2). 

Using  the  results  by  Jendal,  Kuhn  and  Massey  [2]  it  is  pos¬ 
sible  to  give  the  following  theorem. 
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Abstract  —  Lower  bounds  on  the  probability  of  success 
for  the  different  kinds  of  attacks  in  authentication  with  arbi¬ 
tration  are  derived.  These  bounds  give  rise  to  combinatorial 
lower  bounds  on  the  number  of  encoding  rules  and  on  the 
number  of  messages  necessary  in  an  authentication  code  with 
arbitration. 

Summary  —  In  the  model  for  normal  authentication 
the  transmitter  and  the  receiver  are  using  the  same  en¬ 
coding  rule  and  are  thus  trusting  each  other.  However,  it 
is  not  always  the  case  that  the  two  communicating  parties 
want  to  trust  each  other.  Inspired  by  this  problem  Sim¬ 
mons  has  introduced  an  extended  authentication  model, 
here  referred  to  as  the  authentication  model  with  arbitra¬ 
tion,  [1].  In  this  model  caution  is  taken  against  deception 
from  both  outsiders  (opponent)  and  insiders  (transmitter 
and  receiver).  The  model  includes  a  fourth  person,  called 
the  arbiter.  The  arbiter  has  access  to  all  key  information 
and  is  by  definition  not  cheating.  The  arbiter  does  not 
take  part  in  any  communication  activities  on  the  channel 
but  has  to  solve  disputes  between  the  transmitter  and  the 
receiver  whenever  such  occur. 

There  are  essentially  five  different  kinds  of  attacks  to 
cheat  which  are  possible.  The  attacks  are  the  following; 
I,  Impersonation  by  the  opponent.  The  opponent  sends 
a  message  to  the  receiver  and  succeeds  if  the  message  is 
8u;cepted  by  the  receiver  as  authentic. 

S,  Substitution  by  the  opponent.  The  opponent  observes 
a  message  that  is  transmitted  and  substitutes  this  mes¬ 
sage  with  another.  The  opponent  succeeds  if  this  other 
message  is  accepted  by  the  receiver  as  authentic. 

T,  Impersonation  by  the  transmitter.  The  transmitter 
sends  a  message  to  the  receiver  and  denies  having  sent  it. 
The  transmitter  succeeds  if  the  message  is  accepted  by 
the  receiver  as  authentic  and  if  the  message  is  not  one  of 
the  messages  that  the  transmitter  could  have  generated 
due  to  his  encoding  rule. 

Rq,  Impersonation  by  the  receiver.  The  receiver  claims 
to  have  received  a  message  from  the  transmitter.  The  re¬ 
ceiver  succeeds  if  the  message  could  have  been  generated 
by  the  transmitter  due  to  his  encoding  rule. 

Ri.  Substitution  by  the  receiver.  The  receiver  receives  a 
message  from  the  transmitter  but  claims  to  have  received 
another  message.  The  receiver  succeeds  if  this  other  mes¬ 
sage  could  have  been  generated  by  the  transmitter  due  to 
his  encoding  rule. 

In  all  these  possible  attacks  to  cheat  it  is  understood  that 
the  cheating  person  is  using  an  optimal  strategy  when 


choosing  a  message.  For  each  way  of  cheating,  we  denote 
the  probability  of  success  with  Pi,  Ps,  Pt,  Prc  and  P/j,. 
The  overall  probability  of  deception  is  denoted  Po  and  is 
defined  to  be  Pp  =  max(P/,  Ps,  Pr,  Pro.Pr,). 

For  unconditionally  secure  authentication  codes  we  de¬ 
rive  the  following  lower  bounds  on  the  probability  of  suc¬ 
cess  for  the  different  kinds  of  deceptions: 

Pj  >  2~^f®"’®r)+/{Bn;ETlAf ) 

Pr  > 

P/t,  > 

Here  En  is  the  receiver’s  encoding  rule  and  Ep  is  the 
transmitter’s  encoding  rule.  The  bounds  are  valid  for  all 
authentication  codes  with  |5|  >  1  except  for  a  class  of 
degenerate  codes  which  all  have  Pr,  =  1  and  hence  not 
very  interesting. 

From  the  above  bounds  we  also  derive  lower  bounds 
on  the  number  of  encoding  rules  and  on  the  number  of 
messages  to  be  used  in  an  authentication  code  with  arbi¬ 
tration.  Assume  that  the  number  of  source  states  for  a 
symmetric  source  is  |5|  and  let  Po  =  1/9  for  an  authen¬ 
tication  code  with  arbitration.  Let  EhoSt  denote  the 
set  of  possible  pairs  {En,  Et)-  Then  the  following  lower 
bounds  are  valid  on  the  number  of  encoding  rules  and  on 
the  number  of  messages  that  are  necessary  in  the  code, 

I^tI  >  q* 

1^71  o  fr  1  >  9® 

\M\>q^\S\. 

Using  these  combinatorial  lower  bounds  it  is  for  example 
possible  to  show  that  the  cartesian  product  construction 
for  authentication  codes  with  arbitration  does  not  meet 
all  lower  bounds  with  equality,  [1]. 
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One  way  to  encrypt  data  for  secrecy  is 
to  use  a  block  cipher  which  maps  I  q-ary  mes¬ 
sage  symbols  into  t  q-ary  cipher  symbols, 
i.e.,  c=s^(m>,  where  m  denotes  a  message 

block  of  length  t,  c  denotes  a  ciphertext 
block  of  length  t  and  s^^  denotes  a  one-to-one 
transformation  under  the  control  of  a  secret 
key  k.  Without  loss  of  generality  s^^  can  be 

regarded  as  an  element  of  the  set  of  permuta- 

t  t 

tions  on  q  objects.  There  are  q  !  such  per¬ 
mutations  and  in  practice  s^^  needs  to  be  re¬ 
stricted  to  a  subset  of  all  permutations  to 
obtain  a  manageable  keyspace  size.  The  "art" 
of  designing  block  ciphers  then  is  to  find  a 
simple  way  to  specify  permutations  from  a 
subset  of  all  possible  permutations,  without 
simplifying  the  job  of  the  cryptanalyst  which 
tries  to  break  the  system  without  knowledge 
of  the  secret  key  k. 

A  straightforward  way  to  specify  any 
permutation  of  q*  objects  is  to  specify  the 
vector  s°:(s(0)  ,s(  1  )  ,  .  .  .  ,s(q  -1  )  >  ,  which  de¬ 
scribes  the  effect  of  the  permutation  s(.) 
for  each  input  value.  In  this  case  s  is  the 
secret  key,  but  for  any  practical  values  of 

q  this  key  is  imprac t ica 1 ly  large  (e.g., 
70  51  9  A4 

5  '•10  bits  for  q  =5  >.  Another  way  to 

I 

specify  any  permutation  if  q  >5  is  a  prime  or 
a  prime  power  is  in  the  form  of  a  polynomial, 
i.e.,  s(x>ss^x  ♦...+SjX+Sq,  where  t=q  -5  and 
Sj€GF(q*).  Here  the  key  is  the  set  of  coeffi¬ 
cients  and  it  is  readily  seen  that  this 

description  is  as  cumbersome  as  the  s  vector 
given  above.  However,  any  permutation  which 
can  be  written  in  polynomial  fori  can  also  be 
obtained  from  a  series  of  elementary  build¬ 
ing-block  permutations,  each  of  which  is  easy 
to  specify  and  to  compute. 


Our  new  method  of  specifying  permuta- 
£ 

tions  over  GF(q  )  is  based  on  the  function 

f<x)=mx  +c .  It  specifies  a  permutation  if 
£  £ 

m, ccGF(q  ),  m^O,  and  gcd(q  -l,e)=l.  Clearly, 
f<x)  is  quite  easy  to  specify  and  to  compute, 
but  by  itself  offers  little  security.  Howev¬ 
er,  consider  a  sequence  of  n  building-block 
permutations  f ^ ( x ) =m^ x®* +c . ,  i=l,5,..,n,  with 

randomly  chosen  m.,  e.,  and  c.  (satisfyino 
11  1 

the  restrictions  on  m,e,c  given  above).  Suc¬ 
cessive  application  of  these  building-blocks 
yields  an  overall  permutation  s(x)= 
f  <f  .(..f,(x)..)),  which  is  easy  to  speci- 
fy  and  compute  for  suitably  chosen  n  and 
which  (based  on  our  simulation  results)  ap¬ 
pears  to  be  und ist ingui shab le  from  a  randomly 
chosen  permutation.  In  this  case  the  key  is 
given  by  k=( ( m ^ .e^ ,c ^ > ;  i=l,..,n>,  and  it  is 
easy  to  see  that  the  size  of  the  key  can  be 
adapted  to  a  wide  range  of  needs  by  varying 

n.  The  deciphering  function  is  also  very  easy 
to  obtain  by  using  the  inverse  permutations 

f<y>  '=( ( y-c > /m> ^ ,  starting  with  f  (y)  * 

-1 

and  ending  with  f.(y>  .  Note  that  the  expo- 

*  £ 

nent  1 /e  is  computed  modulo  q  -1. 


Exaiaplei  Let  q*=83  and  let  f ,  (  x  )  =  i  i  x^+ 14 , 
13  9  * 

+6.  Then  s=(s(0),..» 

S(ae >>=(4, 14, 12, 17, 1,2, 21, 19, 10,5, 18, 7, 11, 20, 


9,6,0,28,8,3,16,13,15)  and 

f3<f5<f  l<x>  )  )  =5(8(1  lx’+14>’^4.3)’-f6= 

3x2^7x2^14x*^15x‘e.7x‘’*19x‘^*16x'5M5x'‘-* 


15x*^+l lx ’ * ♦9x*®+58x*^+83x®+8x^+13x^+5x®+5x^* 
3  3 

5x  +19x  *4x+4.  The  key  is  k  =  ( ( 1 1 , 7 , 1 4  )  , 


( 8, 1 3, 3 > , ( 5 ,9, 6 > >  and  the  deciphering  func¬ 
tion  is  f j ( fg(fg(y)~‘ )"')"* ,  where  fj(y>“‘= 

(Siy+5>''^,  fg(y)'‘'  =  (3y+14) 

( 15y+B0)^. 
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This  paper  suggests  a  scheme  in  which  crypto¬ 
graphically  strong  permutations  can  be  randomly  selected 
from  a  large  proper  subset  of  the  permutations  on  blocks 
of  binary  numbers  which  have  certain  properties  of 
cryptographic  strength  that  are  independent  of  the 

underlying  Boolean  functions. 

♦  ♦  • 

Let  be  the  group  of  n-bit  binary  numbers  under 
coordinatewise  addition  modulo  2.  An  orthomorphism  is 
a  1-to-l  mapping R:  -» so  that  {x  ©  R(x)  lx  e  Z^)  =  Z^- 

Now,  let  S(x)  =  X  ©  R(x),  then  S(jc)  is  a  permutation 
on  Z^.  Six)  will  have  a  single  fixed  point,  assumed  here  to 
be  0  s  00  and  otherwise  maximal.  It  can  be  represented 
as  a  permutation  (0)  (x^,  X2,  •••,  x„)  or  a  set  of  m  =  2**  - 1 
equations: 

X  ©  R(x)  =  Six) 

0  ©  0  =  0 

x„  ®  Xj  = 

Xj  ©  X2  =  22 


Figure  1 


where  RCx^.l)  =  X*  and  SCx^^.i)  =  z*. 

As  in  any  mapping  on  Z^,  the  orthomorphic 
mappings  x  -*  Six)  can  be  linear,  affine,  or  nonlinear.  The 
linear  version  is  actually  linear  only  at  the  bit  level,  but 
nonlinear  at  the  integer  level.  It  is  common  to  express 
block  substitutions  in  the  form  of  Boolean  functions  on 
X  e  Z^  where  /)(x)  =  0, 1  is  the  value  in  the  ith  bit  position  of 
the  encrypted  image  of  x.  Orthomorphic  mappings  are 
generated  by  other  means  but  can  be  expressed  by  Boolean 
functions  if  desired;  the  Strict  Avalanching  Criterion 
(SAC),  the  Bit  Independence  Criterion  (BIC),  and  other 
desirable  properties  in  a  block  substitution  or  permutation 
depend  on  the  defining  Boolean  functions.  The  process  of 
changing  such  substitutions  raises  questions  as  to  the 
strength  of  the  replacement.  Because  it  would  be  useful  to 
have  a  class  of  block  substitutions  which  possess  some 
property  of  cryptographic  strength  that  is  invariant  under 
change  of  the  defining  Boolean  functions,  a  class  of  so- 
called  balanced  block  substitutions  is  offered.  The 
definition  of  balanced  block  substitutions  is  based  on  the 
fact  that  2^  is  an  additive  group  of  order  2'*  and  has  2”  -  1 
maximal  subgroups  [2],  e.g.,  the  even  numbers. 

A  permutation  or  block  substitution  on  Z^  is  said  to 
be  balanced  if  it  maps  each  maximal  subgroup  half  into 
itself  and  half  into  its  complement.  This  can  be  expressed 
in  terms  of  Shannon's  information  theory;  If  M  is  any 
nriMTinnal  subgroup  of  the  n-bit  numbers,  for  a  balanced 


mapping,  x-^y,  then  the  uncertainty  may  be  expressed  as: 
Hix  c  Af)  ly  £  Af)  =  -log2  Pix  tM)\ytM)- -log2  (0.5)  =  1. 

A  permutation  or  block  substitution  on  Z^  is  shown 
to  be  balanced  if  and  only  if  it  is  an  orthomorphism, 
irrespective  of  whether  it  is  linear,  affine,  or  nonlinear. 

Because  an  orthomorphic  permutation  can  be 
described  by  a  set  of  2"  equations,  an  approach  is  to 
generate  these  randomly  with  constraints,  taking 
advantage  of  the  balance  property.  However,  this  is  a 
rather  inefficient  process. 

An  alternate  method  is  to  construct  an  orthomor¬ 
phism  which  is  linear  at  the  fait  level  and  modify  it  to  be 
nonlinear.  In  that  case,  the  equations  in  Figure  1  take  the 
form: 

**-l  ®  **  =  **.p 

for  some  integer  p  and  for  all  indices  k.  The  permutation 
(0)  (xi,  X2,  ••• ,  x„)  represented  by  the  order  of  the  lx*.i),  tx*), 
and  Ixj^.p)  in  the  set  of  equations  is  also  the  same  as  that 
specified  by  the  mapping  x-*Rix).  Additional  ortho¬ 
morphic  permutations  are  defined  by  any  power  s  of  this 
permutation.  The  result  is  a  new  orthomorphic  linear 
permutation  S'ix)  defined  by  x  R*(x)  represented  by  a  set 
of  m  equations: 

®  **  =  **-p.  «) 

The  integer  p,  is  a  function  of  a.  Ihis  holds  for  1  ^  s  2  2''-2, 
so  that  for  one  basic  orthomorphic  permutation,  a  family 
of  2**  -  2  orthomorphic  permutations,  all  linear  at  the  bit 
level,  is  generated.  This  is  a  transitive  group  of  permuta¬ 
tions  [3].  This  property  is  also  invariant  under  change  of 
the  Boolean  functions  defining  the  basic  block  substitution 
if  it  is  a  linear  orthomorphism.  Any  or  all  of  these  can 
now  be  converted  to  nonlinear  permutations  by  suitable 
modifications  to  a  subset  of  the  equations,  or,  equivalently, 
by  altering  the  order  in  two  of  the  three  columns  of 
numbers. 
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Abstract 

We  define  two  classes  of  strategies  for  snbstitntion 
attack  and  derive  lower  bounds  on  the  probability  of  de¬ 
ception  for  each  class  for  codes  with  perfect  protection 
for  impersonation.  We  show  that  the  equality  of  the 
two  bounds  uniquely  determines  the  number  of  encod¬ 
ing  rules  and  forces  the  incidence  matrix  of  the  code  to 
be  that  of  a  BIBD.  It  also  implies  that  random  selection 
from  the  remaining  cryptograms  gives  the  same  prob¬ 
ability  of  deception  to  the  enemy  as  random  selection 
from  the  set  of  keys  that  are  incident  with  the  inter¬ 
cepted  codeword. 


1  Preliminaries 

We  consider  an  authentication  scenario  in  which  a 
transmitter  wants  to  send  a  message  to  a  receiver  over 
a  publicly  exposed  channel  and  an  enemy  who  tries 
to  deceive  the  receiver  in  accepting  a  fraudulent  mes¬ 
sage  as  genuine.  An  authentication  code  (A-code)  is 
a  collection  Z,  \Z\  =  E,  of  mappings  from  the  set  X, 
|A|  =  Ifc,  of  the  source  states  into  the  set  y,  |y|  =  A/, of 
codewords.  Let  Ym  denote  the  subset  of  codewords  that 
are  authentic  under  the  key  z  ^  Z  and  Z,  denote  the 
subset  of  encoding  rules  that  are  incident  with  y  ^Y. 
The  incidence  matrix  of  an  A-code  is  a  zero-one  ma¬ 
trix  of  size  E  X  M  in  which  a,f  =  1  only  if  p  €  Vi . 

The  communicants  use  a  probability  distribution 
X  =  (xi, on  the  key  space  as  their  strategy. 
In  a  substitution  attack  the  enemy  intercepts  a  code¬ 
word  and  tries  to  substitute  it  with  a  fraudulent  one. 


2  Lower  Bound  for  Substitution  Attack 

We  consider  two  possible  courses  of  action  (classes 
of  strategies)  for  the  enemy  and  find  bounds  on  the 
probability  of  deception  in  each  case. 

Assume  the  enemy  intercepts  a  cryptogram  v  £  Y. 
In  a  class  Ki  strategy  the  enemy  chooses  a  probabil¬ 
ity  distribution  p”  on  Z  which  is  nonzero  only  on  Z^. 


The  enemy  uses  this  distribution  to  select  a  key  z  €  Zp 
and  then  idnd'Miily  chooses  a  codeword  of  Ym  for  sub¬ 
stitution.  Let  Po  and  Pi  denote  the  best  probability 
of  deception  in  impersonation  and  substitution  attack 
respectively. 

Proposition  2.1  If  the  source  is  uniform  and  Po  = 
k/M  we  have 

The  bound  is  achieved  if  p"  is  uniform  for  all  v. 

In  class  Ml  strategies  the  enemy  chooses  a  prob¬ 
ability  distribution  g*  on  the  reduced  cryptogram  set 
Y\v  with  g*  =  0,  and  uses  it  to  choose  a  cryptogram 
for  substitution.  We  have  [1] 

C”'  >  ^  (2) 

and  the  bound  is  achieved  when  g”  =  1/{M  —  1)  for  all 
V.  Combining  the  bound  1  and  2  we  have  theorem  2.1. 

Theorem  2.1  For  a  uniform  source,  if  Po  =  k/M  the 
probability  of  deception  is  lower  bounded  by 


f  IWM  -  *  y 


The  two  bound  are  equal  if  E  —  Eo  = 
which  case  the  incidence  matrix  of  the  code  corresponds 
to  a  BIBD. 

This  result  is  in  accordance  with  Stinson’s  [2]. 

Bounds  1  and  2  are  achieved  for  the  random  strate¬ 
gies  of  class  Ki  and  Mi  respectively.  If  £  >  Eq  ran¬ 
dom  strategy  of  class  Ki  gives  a  higher  probability  of 
success  and  if  £  <  Eq  random  strategy  of  class  M  i  is 
superior. 
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When  analysing  the  security  of  electronic  cash  systems  [1,3]  one 
comes  up  with  a  question  concerning  RSA  signatures.  This  question 
arises  when  looking  at  the  cut  and  choose  method  of  the  withdrawal 
protocol.  Previous  results  [2]  cannot  be  applied.  Using  some  assump¬ 
tions  this  problem  is  solved. 

Introduction 

A  way  to  protect  the  privacy  of  the  user  in  electronic  cash  systems 
is  to  use  the  cut  and  choose  idea  in  the  withdrawal  protocol.  At  the 
end  of  the  withdrawal  protocol  [1,3]  the  user  obtains  an  RSA  root  [4] 
on  the  product  of  the  numbers  that  were  not  opened  by  the  bank.  The 
numbers  in  the  signed  product  that  represents  money,  are  supposed  to 
contain  the  identity  of  the  user  which  prevents  the  user  from  double¬ 
spending  his  money.  Since  the  bank  cannot  verify  all  numbers  before 
signing  the  user  can  try  to  cheat.  For  the  bank  it  is  important  to  know 
what  kind  of  money-representing  signatures  a  cheating  user  can  obtain. 

This  question  is  formalized  next.  After  that  the  assumptions  that 
are  used  to  solve  this  problem  are  given  and  finally  the  results  are 
presented.  For  more  details  and  proofs  see  [5]. 

Formal  statement  of  the  problem 

Let  n  be  an  RSA  modulus  [4],  e  and  d  integers  such  that  e-d  =  I 
(mod  ifiin))  and  C  a  set  of  numbers  coprime  with  n.  The  numbers  e 
and  n  are  public  while  the  number  d  and  the  factorization  of  n  are  not. 
The  elements  of  C  are  images  of  a  one  way  function  F.  The  sentence 
“Choose  an  element  a  from  the  domain  of  F  and  compute  the  image 
X  =  F(a)”  is  for  convenience  abbreviated  to  “Let  x  €  C".  Similarly 
with  subsets  of  C.  For  subsets  W  oiC  the  number  fV  is  defined  as  the 
product  of  all  the  elements  of  W  modulo  n. 

In  the  case  of  the  electronic  cash  system  each  element  from  the 
domain  of  F  determines  the  identity  of  some  user. 

An  honest  user  doing  the  withdrawal  protocol  would  choose  some 
set  H'  C  C  with  all  elements  corresponding  to  his  identity.  The  bank 
would  ask  him  to  open  some  numbers  x  GW  and  the  user  would  obtain 
Xj^  for  some  subset  Xi  of  W. 

A  cheating  user  however  chooses  at  least  one  number  not  containing 
his  identity.  Instead,  he  chooses  a  number  x  modulo  n  in  a  clever  way. 
So  w.lo.g.  he  chooses  some  set  IF  C  C  with  all  elements  corresponding 
to  his  identity,  and  a  number  z  modulo  n.  The  bank  then  asks  him  to 
open  some  of  these  numbers  and  if  he  is  not  caught  (i.e.  the  number  z 
is  not  chosen  by  the  bank),  he  obtains  ■  z)‘'  for  some  subset  Xi  of 
IF.  Since  this  signature  docs  not  represent  money  he  tries  to  compute 

from  it  for  some  subset  Yi  of  C.  Note  that  a  cheating  user  can 
allways  [try  to)  obtain  a  signature  on  the  opened  numbers  by  computing 
2  =  IV"  (mod  n). 

The  central  problem  in  this  paper  can  now  be  stated  as: 

Let  /  >  1.  Let  A,  and  Vj  be  subsets  of  C  (i  =  1, . . .  ,1). 

Is  it  feasible  to  compute,  without  knowing  the  factorization  of  n, 
a  number  z  coprime  with  n  such  that  for  each  1  <  i  <  /  it  is  feasible 
to  compute  from  (^  •  2)“*  modulo  n? 

Assumptions 

Four  assumptions  are  made  (their  interpretation  follows  below): 

Prime:  The  root  e  is  a  fixed  prime,  at  least  5  (for  e  =  3  the  results 
are  different). 

Subset:  The  sets  Xi  to  Xi  are  not  subset-related  i.e.  there  are  no 
two  sets  Xi  and  Xj  (i  /  j)  such  that  A,-  C  Xj. 

Rootcomputability:  Let  x  and  y  be  coprime  with  n.  If  it  is 
feasible  to  compute  from  {x,y,y*}  modulo  n  without  knowing  the 
factorization  of  n,  then  it  is  feasible  to  compute  a  number  re  {0, . . .  ,e- 
1}  and  a  number  a  coprime  with  n  from  {z,y}  such  that  i  =  y'a’ 
(mod  n). 

Rootinfeasibility:  Let  k  >  1  and  let  Xi  to  i*  be  k  different  el¬ 
ements  of  C.  Then  it  is  infeasible  to  compute  numbers  r,,...,rt  e 
{0,...,e  -  1}  not  all  zero,  and  a  number  a  coprime  with  n  such  that 
x','  ■  ■  xl‘  =  a‘  (mod  n). 


In  the  case  of  the  electronic  cash  system  the  subset  assumption  is 
satisfied  because  the  number  of  opened  numbers  is  fixed.  The  rootcom¬ 
putability  assumption  means  that  if  an  RSA-root  is  computable  from 
another  RSA-root,  this  computation  can  be  done  using  only  multipli¬ 
cations,  divisions  and  exponentiations.  Note  that  this  excludes  cases 
like  X  =  (DES(y‘'))‘  (mod  n),  but  for  randomly  chosen  x  this  seems 
to  be  a  reasonable  assumption.  The  rootinfeasibility  assumption  means 
that  it  is  infeasible  to  compute  (non-trivial)  e'*  roots  on  products  of 
elements  of  C.  The  essential  restriction  on  the  ri, . . . , is  that  at  least 
one  is  not  zero.  Realizing  that  the  numbers  in  the  set  C  are  images  of 
a  one  way  function  makes  this  assumption  reasonable. 

Results 

Using  all  four  assumptions  the  problem  is  solved.  W.l.o.g.  all  Y) 
are  not  empty  and  /  >  2.  The  sentence  “a  number  z  coprime  with  n 
such  that  for  each  1  <  »  <  1  it  is  feasible  to  compute  Y^  from  (At  •  z)* 
modulo  n?”  is  for  convenience  abbreviated  to  “such  a  number  z”. 

Theorem  1.  If  it  is  feasible  to  compute  such  a  number  z,  then  (the 
sets  Vi  to  Vi  are  not  subset-related)  or  (there  is  a  number  j  €  {1, . . . ,  1} 
such  that  the  sets  V  for  i  ^  j  are  not  subset-related  and  Yj  C  V  for 
every  «). 

Theorem  2.  Suppose  the  first  case  of  Theorem  1  is  satisfied.  Define 
the  set  If  as  the  union  of  all  A,-,  the  set  A  as  the  intersection  of  all  At 
and  the  set  V  as  the  intersection  of  all  V  (1  <  i  <  /).  Then  it  is  feasible 
to  compute  such  a  number  z  if  and  only  if  Vi<i<i[Yj  =  (U\  Aj)  +  V]  or 

V, <t<,[Vt  =  (At\A)-lV].t 

Theorem  3.  Suppose  the  second  case  of  Theorem  1  is  satisfied. 

W. l.o.g.  j  =  1.  Define  the  set  1/  as  the  union  of  all  At,  the  set  A 
as  the  intersection  of  all  At  and  the  set  V  as  the  intersection  of  all  Y< 
(2  <  <  <  1).  Then  it  is  feasible  to  compute  such  a  number  z  if  and  only 
if  {Vj<,x.[Y;  =  (1/  \  a.)  -I  V)  and  V  =  ( a,  -r  If)  and  V.  =  (1/  \  A, )}  or 
{V,«<:fV  =  (Xi  \  A)  -I  V]  and  V  =  (A,  F  A)  and  V.  =  (A,  \  A)}.' 

FVom  Theorem  1  follows  that  if  such  a  number  z  is  computable,  the 
yi(l  ^  <  1)  are  related  in  only  two  possible  ways.  The  first  possibility 
is  treated  in  Theorem  2.  The  second  possibility  is  treated  in  Theorem  3. 

Observation  of  the  proofs  shows  that  if  such  a  number  z  is  com¬ 
putable,  it  is  easy  to  compute  such  a  number  z.  Furthermore,  in  the 
proofs  is  not  used  that  the  elements  of  C  are  images  of  a  one  way 
function,  although  these  numbers  have  to  satisfy  the  rootinfeasibility- 
assumption. 

When  applying  the  results  to  an  electronic  cash  system  one  has  to 
realize  that  a  signature  can  only  represent  money  if  the  cardinality  of 
Yi  has  some  specific  value.  Therefore  Theorem  2  can  be  applied.  When 
translating  the  results  of  Theorem  2  to  cheat!  ng-user-stratepes  it  follows 
that  a  cheating  user  can  try  to  replace  some  not-  opened-numbers  that 
contain  his  identity  (the  elements  of  A)  by  other  numbers  that  do  not 
contain  his  identity  (the  elements  of  V).  The  remaining  signed  numbers 
can  be  either  the  opened  numbers  or  the  other  not-opened- numbers. 
Note  that  a  cheating  user  is  caught  with  probability  0.5  independent 
of  his  strategy,  although  the  probability  that  his  strategy  succeeds  is 
generally  less. 

Further  research  can  be  done  on  the  area  of  cheating  users  who 
want  to  combine  their  obtained  signatures  to  produce  other  signatures. 
Finally  1  would  like  to  thank  David  Chaum,  Matthijs  Coster,  Hendrik 
Jan  Evertse,  Eughne  van  Heyst  and  Henk  van  Tilborg  for  their  useful 
comments  and  discussions. 
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SUMMARY 


Xianei  ll]  firit  proposed  a  digital  ligaatare 
aeheaie  baaed  oa  error-correct iag  codes.  Later, 
Haro  aad  Waag  I  2]  modified  Xiamei  ackeme  to 
improre  tke  aecaritp  aad  performaoce  of  it. 
Receatly,  Alabbadi  aad  Wicker  [3]  crpptaaalpsed 
tke  Xiamei  ackeme  amd  H-  W’  a  modified  ackeme  , 
poiated  oat  tkat  aader  a  ckoaea  message  attack, 
tke  pr irate  keys  of  these  schemes  cao  be  obtaiaed 
ia  polyaomial  time.  Partkermore,  ia  this  paper, 
we  show  that  if  the  pablic  keys  ff,  H,  J,  aad  T 
desigaed  for  tke  sckemes  satisfy  tkat  the 
coaditioo;  the  matrix  |f,H  ]  or  (  J,  T]  is 
BOBsiagalar,  tkea  the  private  keys  are  easy  to  be 
iafered  ia  polyaomial  time  aader  a  kaowa 
sigaatare  attack.  Piaally,  some  examples  are  girea 
to  iliastrat  oar  eoaelasioas. 

I.  Oescriptiooa  of  Xiamei  sigaatare  scheme; 

User  A  of  the  scheme  chooses  aa  (a,  k,  d)  biaary 
Goppa  code  C  with  a  kXn  geoerator  matrix  G,  aa 
(o-k)XB  parity  check  matrix  H,  aad  t-  error 
-correctiag  capacity.  The  pablic  keys  of  the 
scheme  are 

J.p-iG*S-‘=P-iW, 

f«G*S-S  T«P-‘ir,  aad  H,  t,  t.. 

While  the  private  keys  are  SG  aad  P.  Where  G*  is 
the  matrix  which  satisfies 

GG*>Iki  H^  is  the  traosposed  matrix  of  R. 

The  sigaatare  C,  of  a  k-bit  message  M  ■  is  as 
follows; 

C,-(E.+M.SG)P. 

where  E,  is  aa  a-  bit  raadom  vector  with  the 
Hammiag  weight  w(Et)>t.<t  chosen  by  aser  a. 


2.  Aa  attack  oa  the  scheme; 

If  |W,r'l  is  fall-raak  (then  |J,  T]  is  fall-  rank 
too,  vice  versa),  then 

P-[W,in  IJ.Tl'S 

the  eompatatioaai  complexity  of  calcalatiag  P  is 
0(B*).  The  next  step  is  to  get  the  other  private 
key  86,  knowing  P,  aader  a  known- sigaatare  attack 

Aa 


C.-(E.+M,8G)P, 

C.P-^-E.-M,8G. 

Sappose  tkat  we  know  k  sigaatare-  message-  error 
pattern  triplets  {(Ci,Mi,Et)}  ,  then  we 

can  prodace  a  matrix  eqaatioa; 

C”‘-(C,p-^-E, . .  CkP-*-Ek] 

-{86)^1M, . Mk] 

If  k  measagea  Mj,  ...,Mk  are  linearly  iadepeadeat, 
tkea 

(SGl’-C'-^lMi . MJ-^ 

Tkia  step  can  be  falfilled  in  0(k*)  operations.  So 
tke  total  compatatioB  complexity  of  this  attack 
is  0(n*). 

This  attack  is  also  effective  to  Hara-  Wang's 
modified  scheme. 

To  avoid  this  attack,  tke  scheme  designer  mast 
pick  sack  pablic  keys  W,H, J,  aad  T  tkat  they 
satisfy  the  reqairement;  matrix  |W,H^1  or  [  J,  T] 
is  not  fall-rank. 

3.  Examples  (omitted) 

This  work  was  sapported  in  part  by  the  China 
National  Natare  Scleace  Poandation  and  tke  China 
National  Informatioa  Secarity  Key  Lab  Poandation. 
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Abatraet 

Runlength-limited  (RLL)  codes  are  widely  used  in 
magnetic  and  optical  recording  systaas  to  aid  in  bit 
synchronization  and  reduce  the  effects  of  intersymbol 
interference.  Asymptotic  lower  bounds  on  the  size  of 
these  codes  as  a  function  of  sdnimum  distance  have  been 
recently  reported  by  Kolesnik  and  Krachkovsky.  These 
lower  bounds  are  generalized  to  Include  cost- 
constraints.  Asymptotic  upper  bounds  for  the  size  of 
runlength-limited  codes  are  also  investigated,  and  two 
separate  bounds  are  presented.  Finally,  the  maxisaim  rate 
at  which  information  can  be  transmitted  across  a 
noiseless  channel,  using  sequences  produced  by  a 
nondeterministic  graph,  is  lower  bounded.  The  bound  is 
derived  using  generating  function  techniques. 

introduction 

In  order  to  minimize  the  effects  of  intersymbol 
interference  and  aid  in  bit  synchronization,  stany 
digital  transmission  and  recording  systems  restrict  the 
set  of  allowable  binary  channel  sequences.  A  commonly 
used  constraint  imposes  limits  on  the  minimum  and 
maxi mum  number  of  consecutive  zeros,  which  may  follow  a 
“I*  in  the  binary  channel  stream.  sequences  which 
satisfy  these  runlength  conditions  are  said  to  be  (d, k) 
runlength-limited  (RLL) ,  where  d  and  k  are  the  minimum 
and  maximum  zero  runlengths,  respectively. 

Channels  are  in  general  noisy,  and  it  smy  be 
desirable  to  Incorporate  an  error-correcting  capability 
into  the  communication  or  storage  system  at  the  expense 
of  data  rate.  The  error-correcting  capability  of  a  code 
is  a  function  of  the  distance  distribution  between 
codewords.  The  minimum  distance  between  any  two 
codewords,  dg^^,  is  a  parameter  of  particular  interest. 
A  fundamental  problem  in  coding  theory  is  to  determine 
the  maxinoim  size  of  a  code  having  a  given  This 
problem  has  been  studied  extensively  for  unconstrained 
codes  and  remains  unsolved,  although  upper  and  lower 
bounds  have  been  derived.  Recently,  Kolesnik  and 
Krachkovsky  have  reported  an  asymptotic  lower  bound  for 
runlength-limited  codes. ^  They  obtained  their  result 
ising  a  sphere  packing  type  argument  combined  with  a 
generating  function  technique.  Asymptotic  uf^r  bounds 
for  runlength  constrained  codes,  however,  have  not  been 
reported. 

The  purpose  of  this  paper  is  threefold.  First, 
Kolesnik  and  Krachkovsky’ s  ssymptotic  lower  bound  is 
extended  to  include  costly  runlength-limited 
sequences .  ^ ^  Second,  two  asymptotic  u|^r  bounds  are 
derived  for  the  maximum  size  of  RLL-codes  as  a  function 
of  the  codes  minimum  distance.  Third,  the  maxiausn  rate 
at  which  information  can  be  transmitted  across  a 
noiseless  channel,  using  sequences  produced  by  a 
nondeterminisitc  graph,  is  lower  bounded. 

BT.T-Cnrtea 

It  is  )cnown  that  (d,k)-RLL  sequences  can  be  described  by 
a  finite  state  machine  having  (k-fl) -states  It  is 
convenient  to  represent  the  finite  state  machine  by  a 
graph  with  (k-fl) -vertices.  The  edges  between  the 
vertices  indicate  the  possible  state  transitions,  and 
the  labels  on  those  edges  give  the  corresponding  output 
bits  in  the  RLL-sequence .  Costs  can  be  assigned  to  RLL- 
sequences  by  attaching  a  cost  to  each  edge  in  the 
graph. 2  we  show  that  the  maximum  rate,  R,  at  which 
information  can  be  transaiitted  across  a  noiseless 
channel  using  only  those  sequences  with  average  cost  per 
bit  less  than  a,  and  adnimum  distance  greater  than  8^  is 

lower  bounded  by 

RA  o)  i  R(a  a)  -  mta  (]o|.X,(x,y,z)  -  alot,(xy)  -  8log,(z)] 


where  is  the  largest  positive  eigenvalue  of  a  (k-fl)^ 
by  (k-fD^  transition  smtrix  which  can  be  computed  from 
the  gr^h.  It  is  also  shown  that  R(0,a)  is  given  by 

R(Pt  a)  w  miD  po^(x.y.l)  -  odog.(zy)] 

Ootysl 

and  is  equal  to  the  irrTi~"~  entropy  of  the  Markov  source 
generated  by  assigning  transition  probabilities  to  the 
edges  of  the  grai^. 

Aavmptft^^e  Upper  Bmmrta  an  the  Size  of  RT.T.-rnrima 

Two  asymptotic  ui^er  bounds  are  derived  for  the  maximum 
size  of  a  runlength-limited  code  as  a  function  of  its 
minimum  distance  8.  Let  R(8)  equal  the  maximum  rata  at 
which  information  can  be  transadtted  across  a  noiseless 
channel  using  RLL-codes  with  minimum  distance  8. 
Runlength-liaiited  codas  can  divided  into  constant  weight 
subsets,  and  the  code  rate,  R*(w),  of  the  weight  w 
subset  is  computed  using  combinatorical  arguments.  It  is 
shown  that 

R(S)£  max  maKRnw).  «**(&»)] 

OlSW^l 

where  R**  (8,w)  is  the  McEliece-Rodamich-Rumsey-Welch 
linear  programming  bound  for  the  amximum  rate  of 
(unconstrained)  constant  weight  codes  liaving  minimum 
distance  8.^  A  second  asyaptotic  upper  bound  is  also 
derived  from  upper  bounds  on  the  capacity  of  input- 
constrained  discrete  meacryless  channels.  The  sii^>last 
version  of  this  bound  yields 

R(S)SCfl») 

where  C(8/2)  is  the  capacity  of  a  runlength-limited, 
input-constrained  binary  syametric  channel,  which  has 
cross-over  probability  8/2.  A  tight  upper  bound  for  the 
right-hand  side  of  Eq.  (4)  is  then  obtained  using 
techniques  developed  by  Shamai  and  Kofamn.* 

Bnunriw  on  zhe  Siia  of  Mnndef  nainiatie  Finite  State 
SOliBB, 

If  a  distance  constraint  is  not  imposed,  then  the 
asyaptotic  rate  of  a  RLL-code  is  given  by  the  logarithm 
of  largest  eigenvalue  of  the  graph's  adjacency  matrix. 
Tliis  result,  is  generally  true  for  any  finite  state  code 
provided  the  graph  is  detezadnistic,  that  is,  the  edges 
leaving  any  given  vertex  have  unique  labels.  A 
nondeterministic  graph  having  m-vertices  can  always  be 
mapped  into  an  equivalent  detezadnistic  graph,  but  the 
new  graph  aiay  have  as  amny  as  I"-!  vertices.^  Thus, 
when  m  is  large  it  may  be  computationally  difficult  to 
deteradne  the  largest  eigenvalue  of  the  new  graph's 
adjacency  aiatrix.  A  new  lower  bound  has  been  developed 
for  the  amximum  rate  at  which  information  can  be 
transadtted  across  a  noiseless  channel  using  sequences 
produced  by  s  nondeteradnistic  graph.  The  lower  bound 
is  expressed  in  terais  of  the  largest  eignevalues  of  a 
pair  of  m^  by  m^  aiatrices,  and  these  matrices  are  easily 
found  from  the  original  graph. 
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Summary 

Partial  response  models  for  recording 
channels  go  back  nwny  years  [1].  This  method 
of  channel  modeling  has  led  to  an  interest 
in  developing  trellis  codes  geared  to  such 
channels  [2-7].  This  talk  describes  a 
comparison  of  certain  techniques  for  code 
construction  that  combines  run-length 
limiting  constraints  with  constraints  that 
improve  the  free  distance  (and  thus  the  noise 
tolerance)  over  common  models  for 
magnetic  recording  channels.  Three  partial 
response  channels  are  considered,  "1  -  D",  "1 
-D^'  and"l  +D-I^-D^". 

We  consider  two  techniques  to  define 
codes  and  find  encoders  with  run-length 
constraints  and  coding  gain.  In  the  first 
constraint  type,  a  convolutional  code  is  used 
to  constrain  the  locations  of  the  transitions 
of  the  signal  [4].  In  this  case,  a  convolutional 
code  over  the  ring  of  integers  modulo  q, 
is  specified.  The  code  is  used  to  constrain 
the  sequence  of  transition  times,  modulo  q; 
the  sequence  must  be  a  member  of  the  fixed 
convolutional  code.  It  has  been  shown  how 
this  is  an  effective  technique  for  finding  codes 
with  a  non-trivial  d  constraint  (i.e.,  d  >  0)  [4, 
7).  The  other  construction  is  based  on  a  recent 
result  of  Siegel  and  Karabed,  [5-7],  and  shows 
more  promise  in  the  d=0  case.  Their  result 
shows  that  matching  the  channel  null  with 
a  null  in  the  codebook  leads  to  a  coding  gain. 

In  this  talk  we  present  a  comparison  of 
runlength  codes  we  have  constructed.  The 
resulting  combined  constraints  are  specified 
by  labeled  directed  graphs.  Using 
Mathematica,  we  automatically  construct  a 
code  given  a  graph  specifying  a  constraint  and 
a  rate  less  than  or  equal  to  its  capacity.  Thus, 
the  main  problem  is  finding  interesting 
constraints.  A  constraint  is  said  to  have  a 
certain  free  distance  df,^,  for  a  given  partial 
response  channel,  if  the  distance  between  any 
two  runlength  sequences  satisfying  the 
constraint  have  distance  at  least  df,je  at  the 


output  of  the  channel  (and  at  least  one  pair 
of  codewords  has  distance  df^.  An  important 
issue  is  the  number  of  states,  both  in  the 
original  constraint  and  in  the  final  code.  The 
number  of  states  in  the  constraint  determines 
the  complexity  of  the  decoder;  we  decode  these 
codes  by  finding  the  signal  satisfying  the 
constraint  that  is  closest  to  the  receiv^  signal. 
(It  is  possible  that  a  received  signal  could  thus 
be  decoded  to  something  that  isn't  actually  a 
codeword  from  the  encoder,  the  probability 
of  making  an  error  of  this  type  is  no  more 
than  that  of  picking  a  codeword  that  is 
incorrect.)  The  number  of  states  is  important 
for  encoding  and  uncoding.  A  table  is 
presented  that  summarizes  of  the  codes  that 
we  have  found. 
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Peak  detectors  are  a  common  practice  in  standard  high 
density  magnetic  recording  systems  [l]-[3].  It  has  been 
noticed  diat  one  of  die  major  impairments  in  these  systenu  is 
the  so-called  bit  shift  (peak  shift)  [1]*[5].  In  this  talk  we 
propose  a  new  coding  technique  for  correcting  peak  shifts. 
A  coded  system  that  (^>erates  over  a  multi-  peak  (bit)  shift 
channel  (PSC)  is  best  formulated  in  terms  erf  phrase  lengths, 
where  a  phrase  is  uniquely  defined  by  a  coiuecutive 
sequeiKe  of  bits  starting  with  none,  one  or  more  zeros  ('O') 
and  terminating  with  the  drst  single  one  ('!')  [6].  Any 
binary  sequence  of  zeros  and  ones  is  uniquely  decomposable 

into  a  concatenated  sequence  of  phrases. 

Ihe  new  coding  scheme  is  described  as  follows.  A  sequence 
of  phrases  U  =  (....  Ui, ...}  .  Iff  eS  =  {d+1, ....  k+1)  is  mapped 
to  the  sequence  of  symbols  V=  (...,  Vf.  ...},  Vi€GF(q), 
according  to 

Vi  =  Ui-(d+l)modq.  (1) 

The  sequence  V  is  passed  through  a  rate  kofn  systematic 
convolutional  code  over  GF{q)  that  converts  ko  input 
symbols  into  n  output  symbols.  The  encoder  output 


bit  shifts  has  occurred  at  the  end  the  i -th  phrase.  The  set 
cd  legitimate  peak  shifts  is  S  WO.  jU.....  Jf, ....  if)..  We  assume 
here  that  lej  }  is  an  indepeiulent  identically  distributed 

(ii.d.)  sequence  with  some  probability  distributioa 
The  output  phrase  length  is  in  the  set  (d-2t+l, ....  A;+2f-«-l).  At 
the  receiver  the  decoder  produces  a  maximum  likelihood 
estimate  of  tile  sequence  U  which  we  write  as  D. 

In  the  talk  we  will  analyze  the  main  properties  of  the  codes, 
tile  error  correction  capability  of  tiie  proposed  system,  and 
I^acticrl  decoding  techniques.  An  c^itimal  decoder  as  well  as 
sub  optimal  decoder  wiU  be  introduced.  Interesting  upper 
and  lower  bound  on  the  maximum  data  rate  that  can  be 
transferred  as  a  function  of  tiie  cmivolutional  code  rate  that 
is  used  is  also  worked  out  and  discussed. 

For  example  it  wiU  be  shown  that  tiie  maximum  inhxmation 
rate  that  can  be  transmitted  using  this  technique  over  a 
noiseless  channel  is  bound  by 

Rf(d+l+f)^))^  R  S  R((d+lP^)). 

n  ft 

Here.  R(x)  is  tiie  solutiem  to  the  equation 
k+1 

J^2-iJ^(x),2-xR(x). 

i^+1 

Beferences 


sequence  C  is  mapped  back  to  sequence  of  phrases  in  S 
according  to 

”  ■  IQ  ko+r  +d+l,r=ko, ....  n-1  '  . 

The  combined  coding  and  mapping  defines  a  trellis  code 
with  (k-l-d)V  encoder  states,  where  v  is  the  constraint  length 
of  the  convolutional  code.  Note  that  the  parity  check  phrases 
of  the  trellis  code  are  in  the  set  W+I. ....  d+^). 

The  output  of  the  trellis  encoder  is  passed  through  a  peak 
(bit)  shift  channel  (PSC).  A  r-  position  bit  shifts  cause  the  '1" 
symbol  terminating  the  phrase  to  wonder  by  r  positions  to 
its  right  (right  shifts)  or  to  its  left  (left  shifts).  The  bit  shift 
effect,  either  shrinks  <v  expands  the  input  phrase.  Of  course 
the  phrase  lengths  are  not  modified  if  no  bit  shift  has  taken 
place.  We  restrict  our  discussion  to  <f  ^  2.  k  Therefore, 
in  this  case  additional  phrases  are  neither  generated  nor 
existing  phrases  are  destroyed,  and  the  parity  check  symbols 
are  not  violating  the  (d.k)  cemstraint. 

Let  Xi  stands  respectively  for  the  f -th  channel  input  phrase 
length  and  Yf  for  the  corresponding  channel  output  phrase 
length.  Then,  the  p>eak  shift  channel  is  described  by 

Yf  =Xf +  -I.  (3) 

Here  is  a  random  variable  taking  cm  (0,  ±1, ....  ij, ....  if) 
values  designating  whether  a  left  (-j).  a  right  (+;)  or  no  (0) 
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Abstract 

We  describe  a  class  of  dc-free  subcodes  of  convolutional  codes 
that  satisfy  certain  runlength  constraints  and  that  also  possess  error- 
correcting  capability,  'ne  running  disparity  and  die  maximum 
runlength  of  these  codes  are  bounded  by  quantities  that  are 
independent  of  the  free  distance.  Decoding  is  accomplished  using  a 
Vite^  decoder  for  the  underiying  convolutional  code. 

1.  Introduction 

Deng  and  Herro  [1]  describe  a  noethod  for  constructing  a  dc-free 
coset  code  from  a  given  block  code.  Their  codes  satisfy 

and 

K^2W  +  [  yJ  -  1 
KS2{w+[-^J}-l. 

where  R^  is  the  maximum  running  disparity,  K-i-1  is  the  maximum 
ranlength,  W  is  a  quantity  that  is  greater  than  or  equal  to  the  minimum 
Hamming  distance  of  the  code,  and  Ixj  denotes  the  largest  integer 
less  than  or  equal  to  x  [1](2].  llius.  these  bounds  tend  to  increase  as 
the  error-conecting  capabUify  of  the  code  increases. 

Using  a  procedure  similar  to  that  of  Deng  and  Herro  ( 1  ],  we  have 
developed  a  method  for  finding  a  subcode  of  a  convolutional  code  that 
has  a  spectral  null  at  dc  and  at  the  same  time  satisfies  certain  runlength 
constraints.  We  describe  this  constmction  ptoc^ure  and  state  the 
basic  properties  of  the  codes  it  produces  in  Section  2.  We  describe 
some  representative  codes  obtained  using  this  procedure  in  Section  3. 

2.  Construction  of  the  Codes 

We  begin  with  a  convolutional  code  with  rate  (m-1  Vm  whose  m-1 
input  sequences  and  m  output  sequences  are  (a^  ,  j)  ^ 

{bj .)  ...  .  respectively.  The  (m-l)xm  generator  matrix  is 


’g',(D) 

Gj(D)  ... 

g7(D) 

G(D)  = 

Gj(D) 

e  ... 

O 

g"(D) 

1 

where 

Gl(D)=g^„  +  g{.D  +  g4.D%... 

is  the  generator  polynomial  determining  the  relationship  between  input 
sequence  i  and  output  sequence  j. 

Assume  that  for  one  of  the  (m-1)  inputs,  the  following 
requirement  is  satisfred: 

goi=l  ISjSm. 

V^thout  loss  of  generality,  we  take  the  input  satisfying  (1)  to  be  input 
1.  We  use  input  bit  a,  not  for  transmitting  information,  but  fo 

l«n 


controlling  the  value  of  the  running  disparity.  We  denote  the  running 
disparity  prior  to  the  encoding  of  input  block  n  by  R^j. 

It  can  easily  be  verified  that  complementing  the  value  of  a^^ 
we  complement  all  m  ouQiut  bits  in  block  n.  This  property  enables  us 
to  control  the  value  R^  throug)i  the  following  encoding  procedure: 

1.  Choose  a, 0  and  ctmqiute  the  disparity  of  the  m  output  bits  in 
block  n;  denote  this  diqiarity  by  r^. 

2.  If  R^|J'g<0,  encode  the  m-2  information  bits;  Ba^'Rn.i+r’a- 

3.  If  R„.]J'„=0,  choose  the  value  a|_g  to  reduce  the  runlength  and 

then  encode  the  m-2  information  bits;  if  aj_^  =  0,  R„  =  *'n  • 

otherwise  R„=R„.,-r„. 

4.  If  Ra.i-fn  >  0.  change  aj to  1  and  then  encode  the  m-2 
information  bits;  R^=  R^  ,-  r^. 

The  disparity  of  the  m  output  bits  at  time  n  is  upperbounded  by 
m;  mcreover, 

R^Sm  +  Lfj 
and 

respectively. 

3.  Eiamplc 

We  obtained  a  family  of  convolutional  codes  of  rate  3/4  that 
satisfy  condition  (1)  by  p^orming  row  operations  on  the  generator 
matrices  of  the  corresponding  codes  given  in  Table  II. I  of  [3]. 
These  codes  have  constraint  lengths  ranging  from  3  to  9  and  fnx 
Hamming  distances  ranging  from  4  to  8.  The  dc-free  subcodes 
obtained  by  applying  our  construction  procedure  to  these  codes  have 
rate  2/4,  with  upper  bounds  on  R,q,x  ^  equal  to  6  and  7, 
reflectively. 

^-Canclusifflis 

Using  an  approach  similar  to  that  described  in  [1],  we  have 
develop^  a  procedure  for  constructing  dc-free  codes  with  error- 
control  capabilities.  The  codes  produced  by  this  procedure  are 
subcodes  of  a  convolutional  code  with  bound^  running  disparities 
and  runlengths.  The  decoder  for  one  of  these  codes  is  simply  a 
Viterbi  decoder  for  the  underlying  convolutional  code.  Whereas  the 
bounds  in  [1]  increase  with  the  minimum  distance  of  the  code,  our 
bounds  are  independent  of  distance.  The  codes  given  in  [1], 
however,  generally  have  higher  rates. 

References 

[1]  R.  H.  Deng  and  M.  A.  Herro,  "  DC-Free  Coset  Codes,"  IEEE 
Trans.  Inform.  Theory,  vol.  34,  pp.  786-792,  July  1988. 

[2]  J.  J.  OReilly  and  A.  Popplewell, "  A  Further  Note  on  DC-Free 
Coset  Codes,”  IEEE  Trans.  Inform.  Theory.  VoL  36,  pp.  67S- 
76,  May  1990. 

[3]  S.  Lin  and  D.  J.  Costello,  Jr.,  Error  Control  Coding,  Prentice- 
Hall,  Englewood  CUffs,  N.  J.,  1983. 


240 


CONCATENATED  CODING  FOR  BINARY  PARTIAL-RESPONSE  CHANNELS 


Giovanni  Cherubini  and  Sedat  (^cer 

IBM  Research  Division,  Zurich  Research  Laboratory 
CH-8803  RUschlikon.  Switzerland 


ABSTRACT 

Concatenation  of  outer  conventional  rate-kc/iic  Innaiy  convolu¬ 
tional  coding  with  inner  rate-k/n  special  trellis  coding  for  binary 
partial-response  1-D  channels  is  investigated.  In  the  considered 
scheme,  tte  code  sequence  generated  by  the  convolutional  code  is 
time-interleaved  prior  to  trellis  rocoding.  Decoding  for  the  outer 
convolutional  co^  takes  into  account  the  reliability  of  individual 
code  symbob  {novided  by  die  inner  trellis-decoding  stage.  The 
trellis  code  is  designed  to  achieve  large  minimum  Euclidean  dis¬ 
tance.  The  reliability  of  the  decoded  symbols  is  obtained  in  a  com¬ 
putationally  efficient  way.  The  construction  of  the  trellis  code  b 
based  on  the  partitioning  of  a  set  of  noiseless  channel-output 
sequences  into  subsets  which  are  assigned  to  the  branches  of  a 
combined  encoder  and  channel  trellis.  An  algorithm  for  con¬ 
structing  the  subsets  of  channel-ouq>ut  sequences  is  discussed. 
Trellis  codes  with  various  rates,  minimum  distances,  and  conqtlex- 
ities  are  described  and  the  performance  achieved  by  concatenation 
with  different  convolutional  codes  is  presented.  It  is  shown  that 
substantial  coding  gains  can  be  achieved  by  this  method  as  .com¬ 
pared  to  the  maximum-likelihood  sequence  detection  of  uncoded 
biiuuy  signals. 

SUMMARY 

It  is  well  known  that  large  coding  gains  can  be  achieved  by  con¬ 
catenated  coding  [1].  WhCT  two  concatenated  codes  are  employ 
in  conjunction  with  time-intaleaving,  decoding  for  the  inner  code 
takes  place  first  If  decoding  for  the  outer  is  based  on  soft 
decisions,  increased  immunity  orainst  noise  in  the  order  of  2  dB 
can  be  obtained  over  decoding  '.jemes  using  hard  decitions  only 
[2]. 

This  paper  investigates  concatenated  coding  for  binary  trans¬ 
mission  over  partial-re^nse  channels  described  by  the  time-dis¬ 
crete  transfer  function  1  -  D.  A  conventional  rate-kc/nc  binary 
convolutional  code  is  eiiqrloyed  as  an  outer  code,  llie  inner 
rate-k/n  trellis  code  is  design^  to  achieve  large  minimum  Eucli¬ 
dean  distance  and  is  constructed  so  that  the  reliability  of  the 
decoded  symbob  is  obtained  in  a  conqtutationally  efficient  way. 

The  sequence  of  information  bits  is  mapped  into  a  sequence  of 
binary  symbob  by  convolutional  encoding.  The  binary  symbob  arc 
then  time-interleaved  and  the  obtained  sequence  (b||),bn€  (0,1), 
is  input  to  the  trellis  encoder.  The  trellis  encoder  provides  a 
sequence  of  binary  channel-input  signab  (an),  ag  c  {  -  1,  -«■  1 ).  At 
the  output  of  the  channel, 

*n  =  *n-*n-l+WB.  (D 


sequences  of  length  n.  Thb  set  b  partitioned  into  M  subsets,  each 
subset  containing  2^1  sequences.  The  sequences  witiun  a  subset 
correqumd  to  paths  through  die  1  -  D  channel  irdlb  that  start 
from  a  common  state  and  end,  after  n  transitions,  in  a  common 
state.  Sequences  within  a  subset  are  sqmated  by  a  mmimiim 
Euclidean  distance  of  dmm.  The  subsets  are  then  assigned  to  the 
branches  of  a  comtnned  encoder  and  channel  trdlb  widi  v  states 
according  to  Ungerboeck's  criterion  [3].  Thoe  are  2^  branches 
leaving  from  and  entering  into  each  trdlb  state.  Therefore,  in  the 
encoder  for  the  trdlb  code,  ki  input  bits  select  (me  of  2*^1  paralld 
transitions  between  two  consecutive  encoder  states,  and  k2  iiqiut 
bits  determine  a  state  transition  in  the  encoder  trellis. 

Hie  free  Euclidean  distance  of  the  innner  trdlb  code  b  given  by 

dftee  ~  ■nb  (dnjiij.d'jjjj),  (2) 

where  d'min  is  the  minimum  distance  of  mot  events  of  length 
greater  than  one  trellis  stqi.  An  algorithm  for  constructing  die  set 
of  charuiel-output  sequences  b  discussed. 

For  corrqiutation  of  the  reUalulity  information  relative  to  the 
symbob  bn,  either  maximum-a-posterton  symbol  estimation  or  the 
soft-output  Viterbi  algoridim  (SOVA)  can  be  used  [2].  In  the  case 
k2  =  1  and  dma  <  d'min,  a  siirple  algorithm  b  oboined.  At  each 
decoding  stqi,  the  reUability  of  the  ki  symbol*  on  the  paralld 
transitions  is  computed  by  using  the  associated  branch  metrics. 
The  reUability  of  tte  remaining  symbol  b  obtained  by  the  SOVA. 

TtelUs  codes  with  various  rates,  minimum  dbtances  and  complexi¬ 
ties  are  presented.  Theb  concatenation  with  different  convolutional 
codes  is  investigated.  Overall  performance  is  measured  in  terms  of 
the  asynqitotic  coding  gain  (ACG)  over  the  baseline  system  [4], 
which  is  expressed  in  dB  as 

ACG=  10  10  logio^  ,  (3) 

where  d^  is  the  free  Hamming  distance  of  the  outer  convolu¬ 
tional  co£,  and  R  =  (kckV(ncn)  is  the  overaU  code  rate.  For 
example,  concatenation  of  a  32-state,  rate-Z3,  dg  =6  convolu¬ 
tional  code  with  a  4-state,  rate-3/6,  d|^  =  24  ti^s  code  gives 
R  =  1/3  and  ACG  =  7.8  dB.  Concatewon  of  a  iate-3/4  convo¬ 
lutional  code  with  32  stttes  and  =  S  with  the  trel’i^  'ode  of 
the  previous  example  gives  R  =  3/8im  ACG  =  7.S  dB. 
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where  (wnl  is  a  sequence  of  while  Gaussian  noise  sanqiles.  [1] 
Decoding  for  the  inner  tieUis  code  is  performed  using  the 
sequoice  of  unquantized  channel-output  signab  (zg).  In  this  {2] 
decoding  stage,  reUability  informatiem  assoebted  with  the  binary 
symbob  bg  is  computed  employing  the  combined  encoder  and 
channel  trelUs.  The  sequence  of  reUability  information  b  then  [3] 
deinterleaved  and  used  to  perform  sofi-decision  decoding  for  the 
outer  convolutional  code. 

(41 

A  v-staie,  rate-k/n  treUis  code  is  constructed  as  follows.  Let 
ksk|-*-k2.  ConsidCT  a  set  of  noiseless  1-D  channel-output 
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Abstract 

A  coset  of  a  convolutional  code  may  be  used  to  generate  a  zero- 
run  length  limited  trellis  for  a  1-D  partial-response  channel  h  is 
well  known  that  the  free  Hamming  distance  of  the  convolutional 
code  and  the  free  squared  euclidean  distance  of  the  coset, 
measured  at  the  channel  output,  are  related  by  ^  this 

talk  we  presem  cosets  for  which  is  huger  than  the  free  Hamming 
distance  cd  any  convolutional  code  with  the  same  rate  and  constraint 
length.  We  also  describe  how  the  new  bounds,  described  in  [1,  2], 
on  the  maximum  zero-run  length  of  cosets  may  be  used  to  ensure 
a  shmt  zero-run  length  at  the  channel  output.  Analytical  arguments, 
supported  by  results  &om  computer  search,  suggest  that  cosets  with 
huge  fiee  squared  euclidean  distance  also  have  short  maximum  zaa- 
run  lengths. 

Introduction 

Several  authors  (among  them  Wolf  and  Ungerboeck  [3],  and 
Calderbank,  Heegard  and  Lee  [4])  have  described  recording  sys¬ 
tems  in  which  the  recording  channel  is  regarded  as  a  partial  re¬ 
sponse  channel  with  transfer  polynomial  of  the  form  (1  —  D)^, 
where  N  €  {1,2}.  In  [3],  the  binary  infonnation  sequence  is 
encoded  by  an  enor-coirecting  code;  sent  through  a  channel  pre¬ 
coder  and  subsequently  through  a  (1  —  H)  partial  response  chan¬ 
nel.  The  precoder  essentially  inveits  the  channel  transfer  func¬ 
tion.  The  overall  channel  accepts  a  binary  ({0,1})  input  sequence 
X  =  xj,zi+i,...)  and  produces  a  ternary  ({-1,0, 1})  out¬ 

put  sequence  y  =  (...,  Vi-i,  Vi,  Vi+i ,  •  •).  where  |v,|  =  Zi  for  all  i 
and  the  signs  alternate. 

Codes  for  such  channels  should  have 

(1)  large  free  squared  euclidean  distance,  where  the  squared 
euclidean  distance  between  two  output  sequences  y,  y  is  defined 
as  Ei  (Vi  -  Vi)^.  and 

(2)  short  maximum  zero-run  length,  defined  as  the  maximum  number 
of  consecutive  zeros  in  a  code  word. 

The  squared  euclidean  distance  between  any  pair  of  noiseless 
output  sequences  is  at  least  as  large  as  the  corresponding  Hamming 
distance  between  the  original  convolutional  code  words.  In  [3],  a 
binary  convolutional  code  with  large  free  Hamming  distance  is  used 
for  error  correction.  In  order  to  satisfy  requirement  (2),  a  coset  of 
the  code  is  used  rather  than  the  linear  code  itself. 

New  Codes 

We  present  new  codes  for  such  channels.  As  in  [3],  the  new  codes 
are  cosets  of  convolutional  codes.  For  example,  consider  the  (3,1) 
convolutional  code  with  constraint  length  1  and  parity-check  matrix 

1 0 

The  convolutional  code  has  free  Hamming  distance  3,  as  can  be 
seen  from  the  trellis  of  the  convolutional  code  in  the  figure.  We  obtain 
one  coset  of  the  code  by  adding  the  vector  (010)  to  each  branch  label. 


The  decoder  trellis  for  the  precoded  (1-D)  pattial-tespoose  channel 
is  also  shown  in  the  figure. 


We  exploit  the  underlying  linearity  of  the  decoder  trellis  to 
calculate  <4^  (14  in  the  example),  or  a  lower  bound  on  h.  The 
resulting  algorithm,  which  is  related  to  one  described  by  Zehavi  and 
Wolf  [5],  was  used  *o  find  cosets  with  large  <4„.  A  few  examples 
are  provided  in  the  table  below. 


Rate 

log2  of # 
decodes 

states 

Best 

comparaUe 

Hammii^ 

distance 

Max 

zero-nm 

length 

1/3 

5 

>24 

13 

2 

2/4 

3 

>8 

6 

3 

2/4 

6 

>14 

10 

3 

3/5 

2 

6 

4 

5 

3/5 

4 

>8 

6 

5 

4/6 

3 

6 

4 

7 

7/9(Punct.) 

3 

>4 

3 

8 
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ABSTRACT 

This  paper  deals  with  Reed-Multer  (RM)  coding  and  concatenated  soft- 
decision  decoding  for  binary  partial-response  class-IV  (FRTV)  channels. 
Blodc  interleaved  RM  codewords  are  transmitted  with  precoding  over 
the  PRTV  channel.  In  the  receiver,  soft-decision  decoding  is  accom¬ 
plished  in  two  stages.  An  inner  decoder  accounts  for  the  precoded  trans¬ 
mission  of  binary  symbols  over  the  PRIV  channel.  Approximate 
log-likelihood  ratios  for  individual  code  symbols  are  detennftied  by  a 
new  bi-directional  symbol  estimation  algorithm.  The  obtained  soft  infor¬ 
mation  values  are  deinterleaved  and  passed  to  tite  outer  decoding  stage. 
With  sufficient  interleaving,  these  values  represem  the  appn^ate 
metrics  for  soft-decision  RM  decoding.  An  efficient  suboptitnal 
decoding  algorithm  based  on  the  generalized  multiple  concatenation 
(CMC)  structure  of  RM  codes  is  employed.  Real  coding  gains  over 
uncoded  transmission  with  maximum-likelihood  sequence  decoding  were 
determined  by  simulation.  Results  are  presented  for  various  RM  code 
parameters  and  degrees  of  interleaving.  Encoder  and  decoder  realizatinn 
as  well  as  complexity  issues  are  addressed.  A  comparison  with  oAer 
coding  schemes  known  for  PRTV  channels  shows  significam  advantages 
of  the  scheme  considered  here  in  terms  of  coding  gains  versus  decoding 
complexity. 

SUMMARY 

Coding  techniques  for  binary  partial-response  class-fV  (PRIV)  channels 
with  time-discrete  transfer  fiiMtion  1  -  have  been  studied,  e.g..  in 
[1,2].  In  [1],  convolutiorul  codes  are  used  in  conjunction  with  precoding. 
The  free  Euclidean  distance  between  sequences  of  noiseless  channel- 
output  signals  is  then  essentially  given  by  the  Hamming  distance  of  the 
convolutional  code  employed.  With  the  matched  spectral-null  codes 
described  in  [2]  gains  in  free  Euclidean  distance  are  achieved  by  sending 
constrained  sequences  with  zero  spectral  energy  at  the  null  frequencies 
of  the  PRIV  channel. 

In  this  paper,  we  investigate  the  application  of  Reed-Muller  (RM) 
block  codes.  Information  bits  are  encoded  by  a  systematic  RM  encoder, 
block  interleaved,  and  then  transmitted  over  the  PRIV  channel  with  pre¬ 
coding.  In  the  receiver,  concatenated  soft-decision  decoding  [3]  is 
employed.  The  iimer  decoding  stage  accounts  for  the  precoded  trans¬ 
mission  of  binary  symbols  over  the  noisy  PRIV  channel  and  derives  soft 
information  values  for  these  symbols.  These  values  are  deinterleaved  and 
passed  to  the  outer  decoding  stage  for  soft-decision  RM  decoding.  The 
systematically  encoded  information  bits  are  then  inunediately  available 
from  the  recovered  RM  codewords. 

We  demte  the  interleaved  RM  code  symbols  by  bfc,  k  =  ...  0,1,2 . 

and  represent  these  symbols  in  the  bipolar  form  bk  e  (  -  1.  -t- 1 )  with  the 
mapping  logical  0  -»  -1,  and  logical  /  ->  +1.  The  precoder  generates 
the  binary  transmit  symbols  ak  =  -bkak-2.  The  output  signals  of  die 
noisy  PRIV  channel  become 


(approximate)  maximum  a  posteriori  (MAP)  estimation  of  the  transmit 
symbols  a^  [3],  Viterbi  algorithm  p^-mettic  computatitms  are  per¬ 
formed  both  forward  and  backward  in  tinK.  The  queries  yk  are  then 
obtained  by  suitably  combining  forward  and  backward  difference 
metrics.  It  is  qrparem  from  (2)  that  signals  with  even  and  odd  time 
indices  can  be  processed  independently.  For  practical  reasons,  two 
known  even-  and  odd-indexed  transmit  symbols  ak  are  insetted  between 
interleaving  blocks  to  provide  starting  points  for  the  forward  and  badt- 
ward  recursions.  In  the  outer  decoding  stage,  the  transmitted  infotmation 
bits  are  recovered  fiom  the  deinterleaved  values  yk  by  an  efficient  sub- 
cqximal  soft-decision  decoding  algoritivn  for  RM  codes  recently 
described  in  [4,5]. 

A  binaty  RM  code  R(rjn).  for02r<m,  exhibits  the  code  parameters 

[n  =  2".k  =  5^(™).d  =  2^)  .  (3) 

i=0 

Codewords  can  be  generated  either  according  to  a  well-structured  k  x  n 
generator  matrix  or  as  codewords  of  length  n  - 1  of  a  cyclic  code 
extended  by  adding  one  even-parity  check  bit  [6].  The  first  interpretation 
implies  a  generalized  multiple  concatenation  (CMC)  structure,  (fefined 
by  the  iterative  construction 

R(r+ljn  +  l)=  { |u|u®v|:  u  6  R(r+ Ijn),  VE  R(rjn)}  .  (4) 

The  CMC  structure  has  been  exploited  to  devise  the  soft-decision 
decoding  algorithm  [4,5]  employed  in  the  outer  decoding  stage. 
EtKoding  could  be  ba^  on  the  same  stiucture.  but  a  systematic  encoder 
would  then  not  be  easily  obtained.  The  extended  cyclic-code  interpreta¬ 
tion  of  RM  codes  is  preferied  as  a  basis  for  systematic  cncoditig  by 
simple  shift-register  circuits.  The  two  inierpretations  of  RM  codes  lead 
to  differem  orderings  of  the  code  symbols.  The  reordering  required  to 
rearrange  the  code  symbols  in  the  extended  cyclic  code  for  decoding  by 
the  GMC-based  decoding  algorithm  is  explain^ 

Simulation  results  are  presented  whi^  show  sigitificam  real  coding 
gains  obtained  over  uncoded  binaty  PRTV  transmission  with  optimum 
maximum-likelihood  sequence  decoding.  We  find,  for  example,  t^  with 

a  R(5,9)  =  [512.382,16]  code  a  real  coding  gain  of  4  dB  is  obtained  in 
terms  of  E^o  at  a  bit-error  rate  of  10~^.  In  this  case,  only  23  elemen¬ 
tary  arithmetic  operations  are  needed  for  RM  decoding  per  infotmation 
biL  Interleaving  plays  a  lesser  role  for  RM  codes  with  large  Hamming 
distance.  With  convolutional  encoding  as  described  in  [1],  a  similar  real 
coding  gain  could  only  be  achieved  with  a  R  =  3/4.  v  =  8  convolutional 
code  and  a  512-state  Viietbi  decoder.  With  binaty  spectral-null  codes 

[2],  coding  gains  of  4  dB  caimot  be  achieved  even  asymptotically.  The 
approach  pursued  in  this  study  for  RM  codes  can  be  us^  for  convolu¬ 
tional  coding  as  well.  Further  investigation  is  needed. 


=  ak  -  ak_2  +  w]j  .  (1) 

where  wk  accounts  for  additive  i.i.d.  Gaussian  noise.  If  sufficient  inter¬ 
leaving  is  employed,  the  channel-output  sirpuls  containing  infotmation 
about  one  particular  code  symbol  bk  become  essentially  independent  of 
the  other  spread-out  code  symbols  belonging  to  the  same  codeword. 
Hence,  the  log-likelihood  ratios 


Pk  =  In 


P<z!^|bk  =  -*-l) 
p(z!^|bk  =  -l) 


C) 


where  z±S= ...  zq,  z|,  z2. ...  denotes  the  infinite  sequence  of  observed 
channel-output  signals,  represent  appropriate  metrics  for  soft-decoding  of 
RM  codewords. 

In  the  inner  decoding  stage,  soft  information  values  yk  '  Pk  com¬ 
puted  by  a  new  algorithm  for  precoded-symbol  estimation  in  the  pres¬ 
ence  of  intersymbol  interference  caused  by  (he  PRIV  channel  As  for 
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1  Introduction 


Single  bit  error  correcting  and  double  bit  error  detecting  (  SEC-DED  ) 
code*  are  widely  used  in  conqrnter  lemicondnctor  memory  syitemt  organised 
in  a  one-bit-per-diip  manner.  Tbia  k  because  in  tbis  organisation  any  failure 
in  one  diip  can  corrupt,  at  most,  one  bit  per  codeword.  Rece  ntly  some  systems 
adopt  a  t-bit-per-cbip  mganisation,  where  6  >  2  (1].  A  chip  faihue  in  these 
systems  causes  the  word  read-out  to  have  a  h-bit  block,  called  tyte,  in  am. 
Therefore,  SiEC-DiED  codes,  c^mble  of  correcting  all  single  i-bit  byte  errors 
and  detecting  double  &-bit  byte  oroit,  have  found  ^plications  in  this  kind  of 
systems.  Among  the  predominant  errors,  however,  are  the  toft  errors  induced 
by  or  particles,  whkh  ate  said  to  be  apt  to  manifest  themselves  as  single 
bit  errors  still  in  byte  organized  systems.  Consequently,  these  systems  need 
protection  against  a  single  bit  soft  error  lined  up  in  a  codeword  irith  another 
existing  single  byte  hard  error  due  to  a  chip  failure. 

FVom  the  standpoint  mentioned  above,  this  p^>er  proposes  a  new  dass  of 
linear  codes,  called  single  i-bit  byte  error  correcting  and  sin^e  i-bit  byte  and 
sin^e  bit  error  detecting  codes,  or  SiEC-(Si-|-S)ED  codes.  This  class  of  codes 
can  correct  all  single  byte  errors  and  detect  any  error  that  corrupts  both  a 
siniJe  byte  and  a  single  bit  in  a  codeword. 

In  the  later  sections,  bounds  and  construction  methods  of  SiEC-(Si-)-S)ED 
codes  are  given,  and  it  is  shown  that  some  codes  proposed  in  this  p^>er  meet 
a  lower  bound  on  check  bit  length. 


2  S6EC-(S6+S)ED  Codes 

In  this  paper,  codeword  of  N  bits  length  are  divided  into  n  bytes,  where 
all  the  bytes  are  i-bit  wide  except  the  last  one  whidi  is  allowed  to  have  c-bit 
width  (  0  <  c  <  i  ).  The  foDowiag  notations  are  used  in  this  paqrer: 
ii  byte  length  of  the  ith  byte,  i.e., 

i,'  s  i  for  i  =  1, 2,  ■  •  • ,  a  -  1  and  i.  =  c. 
a  code  length  in  byte. 

N  code  length  in  bit,  Le.,  Af  =  i(n  -  1)  -I-  e. 

K  information  bit  length. 

R  check  bit  length,  i.e.,  R  =  N  —  K. 

(N,  A)code  linear  code  of  code  length  N  and  information  bit  length  K. 
04  vector  composed  of  d  O’s. 

d  is  omitted  if  there  is  no  fear  of  confusion, 
z*  transpose  of  vector  z. 

[XY]  concatenation  of  vectors/matrices  X  and  Y. 

AH  the  matrices  and  vectors  in  this  paper  are  over  GF(2)  and  vectors  are 
row  vectors  unless  referred  to  otherwise.  When  the  parity  check  matrix  of  an 
(A^,  K)  S6EC-(St-f  S)ED  code  is  expressed  at  H  =  [H\Hj  •  •  ■  ff.],  where  Hi  is 
an{N  -  K)  xbi  matrix  for  i  =  1, 2,  ■  ■  ■ ,  a. 

SiEC-(Si-l-S)ED  codes  are  a  class  of  single  h-bit  byte  error  correcting  codes, 
whidi  can  detect  any  doable  byte  error  such  that,  at  least,  one  of  the  tvro  byte 
errors  has  Hamming  weight  one.  Fig.  1  shows  examples  of  a  correctable  error 
and  a  detectable  error  of  S4EC-(S4-fS)ED  codes. 


correctable  error 


detectable  error 


1  0101' 

“rror 

T-BrnTT 

"Bun 

1  BIBI 

1  1 

■BTIB" 

T™n 

byte  error 

byte  error 

'  T 

Ml  error 

1  Oral 

BT 

1  1 

■Bl 

E] 

1  UIOl 

1  bAo  1 

UIHT 

~nfen 

Fig.I.  Examples  of  a  correctable  error  and  a  detectable  one 
for  i  =  4  and  N  =  15. 


In  the  linear  case,  the  definition  of  the  codes  b  equivalent  to  the  following 
theorem. 

Theorem  3.1  A  linear  (N,K)  code  with  parity  check  matrix  H  = 
[HiHj  •  •  •  H.)  b  an  S6EC-(S5-t-S)ED  code  iff 
Vi,i€{l,2,...,«}  (i^»,  Ve,  eGF(2)‘',e,€GF(2)*>, 

(eie2]  #0  — *  fte',  +  H^eJ  ^  o', 

and  Vi,j,*  6  {l,2,-  -,a}  (i  ^  j  ^  k  ^  •),Ve,  €  GF(2)*-,e2  e  GF{2)*i,e,  € 
GF(2f>, 

€3  has  Hamming  wei^t  on*-  — ►  fte*  -b  -I-  AfseJ  ^  o'.  □ 

3  Bounds 

Theorem  3.1  Linear  (Af,  K)  S5EC-(Si-t-S)ED  codes  satbfy 


R  =  N-K>V>+1, 

H  =  AT  -  A  >  i -h  (logjCAf  -  26 -I- 2*)1, 
and  Jl  =  Af-A>  flogj[(2*-l){Af(6-|-l)-6»-c}/6-|.2*n.  D 

The  first  inequality  in  the  the<»em  corresponds  to  Singleton  bound,  the 
second  and  the  last  ones  to  Hamming  bound  [2].  Ron^y  speaking,  the  second 
bound  b  titter  than  the  last  when  AT  b  rdrbively  small,  and  vice  versa. 

4  Code  Construction  Methods 

Two  construction  methods  of  S6EC-(S6-|-S)ED  codes  are  given  in  tbb  sec¬ 
tion.  The  first  one  derives  codes  of  arbitrary  byte  length  and  code  length, 
while  the  second  one  lads  flexibility  for  code  length.  The  second  one,  how¬ 
ever,  provides  more  efficient  codes  than  the  first  one. 

Construction  Method  1 

The  foDowiag  procedure  derives  S6EC-(S6-|-S)ED  codes  horn  S6'EC  codes 
(^,  when  6*  =  6  —  1. 

1)  Let  W  =  \H[H3  ■  •  ■  denote  the  K  x  N'  parity  dtedt  matrix  of  an 

S6'EC  code.  Given  the  code  length  N',  integers  n,  o'  and  are  defined  in  the 
same  wsgr  as  in  Section  2,  so  that  Af  =  - 1)  -f  e*.  If  6*  =  1,  an  S6'EC  code 

dtould  be  regarded  as  a  smqrle  SEiC  code. 

2)  An  IF  X  (AT  +  n)  matrix  H  =  [H1H3  ■  ■  -  A.]  b  obtained  by  A  =  AJ  • 
(Ik/,]*  •  =  1,2, where  [Itj/J  derurtes  the  6|  x  6J  identity  matrix  /»• 
followed  by  a  -dimensional  even  wei^t  column  vector  /j. 

3)  Let  an  I?"  X  (6j  -t- 1)  matrix  Ut  =  [uiU^  •  -  •  Ui]  denote  the  collection  of 

the  tame  6|  1  odd  wei|^t  odumn  vectors  tti’s,  for  •  =  1,2, ■••,11.  Take 

U  =  IGillj-”!/,)  oonsbting  of  those  l/;’s,  where  u,  ^  U;  for  i  #  j,  i,j  e 
0.2. 

4)  Finally,  the  nuD  space  of  the  following  matrix  b  an  S6EC-(S6-fS)ED 
code  of  byte'length  6  s  6'  -f  1,  code  length  N  =  N'  +  n  and  c^eck  bit  length 

R^R  +  BT:  fHl.fA,  H2  •••  H.  1 

I  1/  J  ■  I  t/i  llj  •••  U,\ 

Theorem  4.1  The  codes  obtained  with  the  above  procedure  are  S6EC- 
(S6-1-S)ED  codes.  □ 

Construction  Method  2 

Theorem  4.3  The  nuD  space  of  the  foOowing  matrix  b  an  (Af  =  62*  -I-36-)- 
1,  A  =  62*)  S6EC-(S6-I-S)ED  code: 


■  / 

I 

I  ■ 

I 

1 

I 

0 

0 

0 

oj 

/ 

T 

T»  • 

.  r»-s 

rr-r 

0 

I 

0 

I 

/ 

ji 

T*  • 

jO(e-a) 

0 

0 

I 

I 

0| 

.  0| 

ot 

o»  • 

Ot 

0| 

0% 

o» 

oa 

It 

1 

where  f  =  2*,  I  b  the  6  x  6  identity  matrix,  O  b  the  6  x  6  sero  matrix,  1  b 
the  vector  composed  ^  d  I's,  and  the  6x6  matrix  T  b  the  companion  matrix 
of  a  primitive  polynomial  of  degree  6  [1]  [3].  D 

5  Evaluation 


For  any  byte  length  6  >  2,  the  cosistmction  method  1  shown  in  the  previous 
section  provides  S6EC-(S6-f.S)ED  codes  which  meet  the  first  bound  in  Theo¬ 
rem  3.1.  For  the  practical  code  parameters  of  6  =  4  and  A  =  64,  in  particular. 
Theorem  4.2  gives  an  S6EC-(S64S)ED  code  of  check  bit  length  R  =  13,  while 
the  previously  known  S6EC-D6ED  codes  requires,  at  least,  14  check  bits  [4]. 

6  Conclusion 

Thb  paper  has  proposed  anew  class  of  error  control  codes,  S6EC-(S6-i-S)EiD 
codes,  c^^le  of  correcting  aO  single  byte  errors  and  detecting  any  error 
that  corrupts  both  a  single  byte  and  a  sin^  bit  in  a  codeword,  suitable  for 
semiconductor  memory  systems  organised  in  a  6-bit-per-dup  manner. 
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ABSTRACT.  We  consider  codes,  consisting  of  sequences 

Q,  02  o>T 

0  10  ^l---0  ‘U,  where  d<aj<k, 

and  call  them  (d,k)— codes  of  reduced  length  N.  We  introduce  a' 
definition  of  arbitrary  (d,k)-  and  perfect  (d,k)— codes  capable  of 
correcting  single  peak-shifts  of  given  size  t.  For  the  construction 
of  perfect  codes  we  use  a  general  combinatorial  method 
connected  with  finding  "good"  weight  sequences  in  Abelian 
groups,  and  introduce  the  concept  of  perfect  t-shift  N-designs. 
We  give  explicit  constructions  of  such  designs  for  t=l,  t=2  and 
t=(p— 1)/2,  where  p  is  a  prime.  Our  construction  is  not  only 
effective,  but  also  universal  in  the  sense  that  it  does  not  depend 
on  the  (d,k)-constraints.  It  also  allows  to  correct  automatically 
those  peak— shifts  that  violate  (d,k)— constraints.  Furthermore, 
our  construction  is  naturally  extended  to  (d,k)— codes  of  fixed 
binary  length  and  allows  the  determination  of  the  beginning  of 
the  next  code  word.  The  question  whether  the  designed  codes 
can  be  represented  as  systematic  codes  with  minimal 
redundancy  is  considered  as  well.  In  particular,  for  any 
(d,k)— code  with  n  q-ary  (q=k— d+1  >21  information  digits  we 
give  a  method  of  finding  r  q-ary  check  digits  such  that  the 
resulting  systematic  code  is  capable  of  correcting  sinde 
peak-shifts  of  size  1,  where  r  is  determined  uniquely  by 
qr-i(r-i)  <  2n+l  <  qf-2r.  This  code  is  perfect  if  2n+l  =  qf-2r. 

INTRODUCTION 

In  high-density  magnetic  recording  systems  Run  Length 
Limited  (RLL)  sequences  are  used  to  increase  density  and 
control  self  clocking  [1].  The  read-out  mechanism  detects 
changes  in  magnetization  and  thus  from  the  RLL  sequence  we 
can  derive  a  so  called  (d,k)— sequence,  where  d+l  and  k+1 
correspond  to  the  minimum  and  maximum  length  of  the  RLL 
substrings,  respectively.  A  (d,k)7-sequence  is  represented  by 
consecutive  zero  symbol  runs  of  length  i,  d<i<k,  between  pairs  of 
one  symbols.  Read-out  circuitry  imperfection  and  clock  jittering 
may  cause  misdetection  of  transitions  and  is  supposed  to  result 
in  peak-shifts  left  or  right  in  the  (d,k)-sequence. 

Shamai  and  Zehavi  [2]  give  bounds  on  the  capacity  of  the 
bit-shift  magnetic  recording  channel.  Kolesnik  and  Krachkovski 

[3]  obtained  asymptotic  bounds  on  the  achievable  rates  of 
bit-shift  error-correcting  codes.  Fredrickson  and  Wolf  [4] 
introduced  a  class  of  single  bit-shift  detecting  codes.  Codes 
designed  specifically  to  cope  with  a  single  bit  shift  and  multibit 
shifts  of  a  single  position  are  discussed  by  Kuznetsov  and  Vinck 
and  by  Ytrehus  in  [5,6] ,  respectively.  Ferreira  and  Lin  [7]  give 
code  constructions  based  on  the  representation  of  constrained 
sequences  as  integer  compositions.  Abdel— Ghaffar  and  Weber 
[8]  extends  the  results  given  in  [4,6] .  We  discuss  the  design  of 
en-  and  decoding  for  the  multibit  peak— shift  channel. 

We  give  a  definition  of  a  multibit  peak-shift  and  a  general 
definition  of  a  code  capable  of  correcting  single  peak-shifts  of 
size  t  We  concentrate  on  codes  C  consisting  of  (d,k)-sequences, 
and  call  them  (d,k)-codes.  For  (d,k)-codes  with  k-d>2t  we 
introduce  the  concept  of  a  perfect  code  capable  of  correcting 
single  peak-shifts  of  size  t.  We  remark  that  the  problem  of 
constructing  maximum  (d,k)— codes  is  reduced  to  the  same 
problem  for  (d,k)-codes  with  a  fixed  number  N  of  substrings. 

We  give  a  universal  and  effective  construction  of 
(d,k)-codes  capable  of  correcting  single  peak-shifts  of  size  t. 
The  construction  is  universal  in  the  sense  that  it  does  not 
depend  on  the  (d,k)-constraints  and,  in  particular,  allows  to 
correct  single  peak— shifts  of  size  t  which  disturb  these 
constraints.  The  main  idea  of  t*''  construction  consists  of  using  a 
finite  Abelian  group  G  of  order  m  to  partition  any  code  C  into 


m  subcodes,  each  having  the  desired  error  correcting  properties 
19]  Since  at  least  one  of  these  subcodes  has  size  at  least  |C|/m, 
this  construction  is  efficient  if  the  order  of  the  group  G  is 
sufficiently  small.  In  the  framework  of  this  construction  we 
reduce  the  problem  of  finding  perfect  {d,k)-codes  of  reduced 
length  N  capable  of  correcting  single  peak-shifts  of  size  t  to  the 
problem  of  finding  "good"  weight  sequences  in  Abelian  groups 
and  introduce  the  concept  of  perfect  t-shift  N-designs. 

We  give  explicit  constructions  of  perfect  t-shift  N-designs 
for  t=l  and  any  N  and  for  t=(p— 1)/2,  where  p  is  a  prime,  and 
N=(pr-l)/(p-l).  Moreover,  we  fina  the  necessary  and  sufficient 
conditions  lOt  the  existence  of  perfect  2-shift  N-designs. 

We  consider  the  problem  of  finding  the  minimum 
redundancy  r  of  systematic  codes  which  are  contsuned  in  the 
constructed  perfect  (d,k)— codes  of  reduced  length  N  capable  of 
correcting  single  peak— shifts  of  size  t.  This  problem  is  connected 
with  the  existence  of  a  particular  ordering  of  perfect  t-shift 
N-designs.  We  show  that  the  lower  bound  r  >  flogq2tN+ll]  is 
attained  in  some  cases,  where  [x]  is  the  smallest  integer  at  least 
X  and  q=k-d+l.  Furthermore,  for  any  (d,k)-code  with  n  q-ary 
information  digits  we  give  a  method  of  finding  the  minimum 
number  of  q-ary  check  digits  such  that  the  resulting  systematic 
(d,k)-code  is  capable  of  correcting  single  peak-shifts  of  size  1. 

For  an  idei  multibit  peak— shift  channel,  decoding  errors 
that  do  not  occur  in  the  Nth  substring  do  not  propagate  to 
subsequent  blocks,  as  the  length  of  the  code  word  does  not 
change.  However,  if  a  decomng  error  occurs  in  the  Nth 
substring,  the  first  symbol  of  the  next  block  is  in  error  and  thus 
we  make  a  decoding  error  in  this  block.  Only  if  again  in  the  Nth 
substring  a  decoding  error  occurs,  we  may  speak  of  error 
propagation.  By  appropriate  code  selection  we  may  avoid  this 
phenomenon.  On  the  other  hand  catastrophic  error  propagation 
occurs  whenever  random  errors  are  involved.  These  errors  ruine 
the  structure  of  code  words.  They  insert  new  phrases  or  delete 
existing  phrases  in  a  code  word  and  thus  synchronization 
regarding  the  beginning  of  the  first  sjmibol  of  a  code  word  is 
completely  lost.  One  way  to  solve  this  problem  is  to  fix  the 
length  of  the  codeword  to  a  certain  value.  We  construct  codes 
with  a  fixed  binary  length  L  by  considering  the  union  of  all  code 
words  of  binary  length  L  belonging  to  the  (d,k)-codes  of  reduced 
length  N,  L/(k+l)  <  N  <  L/(d+l).  The  code  words  of  fixed 
binary  length  start  with  d  zeros  and  end  with  a  symbol  equal  to 
1.  These  code  words  can  be  stored  without  merging  digits. 
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Linear  codes  with  ‘good’  parameters  can  be  constructed  from  algebraic  curves  over  finite 
fields.  Since  Goppa’s  original  paper  in  1981,  there  has  been  a  constant  flow  of  research  on 
(1)  asymptotic  properties  of  these  codes,  (2)  behaviour  of  these  codes  on  difierent  types  of 
curves,  (3)  efficient  decoding. 

The  construction  gives  linear  9-ary  [n,  k,  <f]-code8  satisfying 
(a)  |n  -  (9  -I- 1)1  <  29^;  (b)  fc  =  m  -  9  +  1;  (c)  d  >  n  -  m. 

Here  9  is  the  genus  of  the  curve  and  m  is  a  positive  integer  satisfying  n  >  m  >  29  —  2.  An 
importwt  consequence  is  that  d  satisfies  n  —  k+l>d>n  —  k+1—  g. 

The  length  n  of  the  code  is  at  most  the  number  of  rational  points  on  the  corresponding 
curve  C.  So  it  is  of  interest  to  study  the  codes  arising  from  a  curve  C  with  a  ‘large’  number 
of  rational  points,  that  is,  points  defined  over  the  ground  field  F,.  The  known  classes  are 
(1)  modular  curves;  (2)  Hermitian  curves;  (3)  Suzuki  curves;  (4)  Ree  curves. 

The  main  feature  of  the  modular  curves  is  that  they  provide  a  sequence  for  which 
limn/9  —  y/q  —  This  leads  to  the  asymptotic  result  bettering  the  Gilbert- Varshamov 
bound.  The  number  of  F, -rational  points  on  a  Hermitian  curve  is  gy/q+  1;  the  number  of 
F, -points  on  a  Suzuki  curve  is  9^  -f  1;  the  number  of  F, -rational  points  on  a  Ree  curve  is 
9^  -f  1.  in  these  last  three  cases  as  well,  interesting  codes  are  obtained.  A  common  feature 
is  the  great  symmetry  that  these  curves  enjoy,  in  the  sense  of  having  a  large  group  of 
automorphisms. 

The  geometry  of  these  codes  can  be  explored  from  two  points  of  view.  Thdr  large 
automorphism  group  of  the  curves  reflects  many  interesting  geometrical  properties.  Also  a 
linear  9-ary  [n,  k,  d]-code  can  be  considered  as  a  set  of  n  points  in  the  projective  space  P*~' 
with  at  most  n—d  of  these  points  in  any  hyperplane.  This  connects  coding  theory  problems 
on  the  maximum  value  of  parameters  with  combinatorial  problems  in  finite  projective 
spaces  on  the  maximum  size  of  subsets  subject  to  certain  intersection  conditions. 
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We  describe  polynomially  constructible  (PC)  algorithma 
based  on  curves  with  many  rational  points,  namely  those 
curves  which  are  used  in  the  theory  of  algebraic  geometric 
codes,  but  involved  here  in  some  other  contexts.  We  shall 
develop  the  followin"  results  : 

multiplication  algorithms 

D.  and  O.  Chudnovsky  discovered  that  one  can  use 
curves  with  many  points  in  order  to  define  fast  bilinear 
multiplication  algorithms  in  large  extensions  of  a  finite  field. 
If  we  have  a  bilinear  multiplication  algorithm  B(n) 
expressing  the  product  of  two  elements  in  GF(q‘),  the  relative 
multiplicative  complexity  of  B(n)  is  defined  as 

_ _  m(B(n)) 

pfBCn))- — - — 


elliptic  curves.  The  Coxeter-Todd  lattice,  the  Leech  lattice  can 
be  constructed  in  this  way  ;  and  one  gets  in  high  dimenrions 
some  lattices  densest  than  those  previously  known. 

asymptotical  bounds  for  sphara  paekbiga 

If  we  have  a  family  of  lattices  P(n)  c  R"  with  n  -»  •>, 
then  Minkowski  showed  that  there  exist  families  with 
asymptotic  exponent  lim  sup  X(P(n))  2  -  1 ;  but  in  fkct  it  is 
very  difficult  to  construct  asymptotically  good  families  of 
lattices,  i.e.  with  finite  asymptotic  density  exponent  By  the 
use  of  algebraic  curves  with  many  ratimud  points,  several 
authors  (Lytein,  Quebbemann,  Rosenbloom,  Tafaaman, 
Vladut)  gave  PC  constructions  of  families  of  asymptoticsUy 
good  families  of  lattices  and  of  sphere  paddngs. 


where  m(B(n))  is  the  number  of  multiplications  by  non¬ 
constant  terms  needed  in  order  to  perform  B(n).  Then, 
following  Shparlinsky,  Tsfasman  and  Vladut,  there  are 
families  of  algorithms  with  a'^ymptotic  multiplicative 
complexity  ■  Hm  inf  p(B(n))  rather  small  and  in  any  case 

finite,  for  instance  P2  < 

dense  lattices 

Let  P  be  the  set  of  centers  of  a  packing  of  equal  non¬ 
overlapping  spheres  in  the  euclidean  apace  R".  Denote  by  8(P) 
the  density  of  P  and  by 


asymptotical  bounds  for  spherical  codsa 

Consider  spherical  codes  X  on  the  unit  sphere  of  R"  with 
angular  distance  «p.  The  number 

u  I 

is  called  by  Shannon  the  reliability  of  sudi  a  code,  and  he 
gave  a  lower  bound  for  the  asymptotic  reliability  of  families  of 
such  codes.  Then  there  are  PC  families  of  spherical  codes 
whose  reliability  is  at  least  one  half  of  the  Shannon  lower 
bound  (Laehaud,  Stern).  There  are  also  families  with 
asymptotic  Masing  number  >  2/15. 


WPl-logjj 


g(P) 

n 


the  deneity  exponent  of  P.  Elkies  and  Shioda  have  given  a 
general  process  of  construction  of  dense  lattices  based  on 
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This  talk  is  devoted  to  some  applications  of  al¬ 
gebraic  geometry  to  coding  theory  other  than  al¬ 
gebraic  geometry  codes.  The  general  scheme  of 
these  applications  is  converse  to  the  Goppa  con¬ 
struction,  which  associates  a  code  to  some  alge¬ 
braic  geometry  data.  Here,  on  the  contrary,  some 
problems  in  coding  theory  give  rise  to  certain  al¬ 
gebraic  varieties  over  finite  fields,  so  that  these 
problems  can  be  formulated  as  questions  about 
these  varieties  (usually  concerning  their  ratonal 
points).  One  schould  mention  that  usually  the  al¬ 
gebraic  geometrj'  problems  arising  in  this  way  are 
rather  subtle;  nevertheless  there  are  some  cases 
where  it  is  possible  to  solve  them  using  power¬ 
ful  technique  of  modern  algebraic  geometry  whih 
leads  to  rather  interesting  results  in  coding  theory. 

In  this  talk  we  consider  the  following  results; 

•  complete  determination  of  the  covering  ra¬ 
dius  of  BCH-codes  of  large  length  (both  prim¬ 
itive  and  non-primitive)  ;  this  uses  the  Lang- 
Weil  bounds  for  the  number  of  rational  points 
on  the  variety  over  a  finite  field; 

•  complete  determination  of  weights  of  codes 
orthogonal  to  certain  binary  and  ternary 
cyclic  codes  (the  Melas  code,  certain  classi¬ 
cal  Goppa  codes,  the  Zetterberg  code),  which 
reduces  to  counting  rational  points  of  certain 
elliptic  and  hyperelliptic  curves; 

•  complete  determination  of  the  weight  enu¬ 


merator  for  some  of  these  codes,  which  de¬ 
pends  on  certain  calculations  with  the  trace 
formula  for  Hecke  operators; 

•  direct  computation  of  the  weight  of  certain 
subcodes  of  second  order  Reed-MuUer  codes 
(without  using  the  MacWilliams  identities), 
which  reduces  to  the  study  of  a  family  of  su¬ 
persingular  Artin-Schreier  curves. 
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A  decade  ago  the  problem  of  decoding 
algebraic-geometric  codes  looked  hardly  tangible 
and  rather  far  from  algebraic  geometry.  Both 
proved  to  be  wrong. 

The  break-through,  started  by  Justesen  during 
his  visit  to  Moscow  in  1988,  last  year  reached  the 
point  of  decoding  algebraic-geometric  codes  up  to 
half  the  designed  minimum  distance. 

This  illustrious  achievement  is  due  to  the  work 
of  many  mathematicians,  including  Vladut,  Sko- 
robogatov,  Larsen,  Havemose,  Elbrond  Jensen, 
Hoholdt,  Porter,  Krachkovskii,  Pellikaan,  and 
Shen,  the  final  result  being  obtained  by  Ehrhard, 
Feng,  Rao,  and  Duursma. 

The  algorithms  we  have  now  are  both  of  reason¬ 
able  complexity  and  rather  easy  to  understand. 
However  they  do  tcingle  several  specifical  difficul¬ 
ties  of  algebraic  geometry  nature. 

In  this  talk  the  principal  points  of  these  decod¬ 
ing  algorithms  will  be  described  for  the  simplest 
example  of  the  curve  being  the  line,  with  the  dif¬ 
ficulties  of  the  general  case  being  pointed  out  on 
the  way. 
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ABSTRACT 

We  present  an  equivalent  form  of  the  decoding 
algorithm  in  [2].  It  achieves  the  designed  min¬ 
imum  distance  in  Decoding  Algebraic- Geometric 
Codes.  For  a  wide  class  of  such  Codes  the  algo¬ 
rithm  is  described  in  an  elementary  way  with  a 
minimum  of  Algebraic  Geometry  concepts. 

1.  V.D.  Goppa,  Codes  on  algebraic  curves,  So¬ 
viet  Math.  Dokl.,  vol.  24,  pp.  170-172,  1981. 

2.  D.  Ehrhard,  Achieving  the  designed  error 
capacity  in  Decoding  Algebraic  Geometric 
Codes.  To  appear  in  IEEE>  Transactions  on 
Information  Theory. 
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ABSTRACT 

To  improve  re-use  of  time/ftequency  slots  in  a  cellular  radio  sys¬ 
tem,  it  is  desirable  for  the  average  interference  levels  seen  by  all 
users  to  be  made  approximately  equal.  We  provide  constructions 
based  on  orthogonal  latin  squares  that  guarantee  diffoent  sets  of 
users  to  interfere  in  successive  slots.  We  illustrate  how  this  may 
be  combined  with  convoludonal  coding  to  provide  large  perfor¬ 
mance  improvement  with  low  delay  in  a  slow  hopped  system. 

SUMMARY 

In  mobile  cellular  radio,  the  dominant  impairments  are  muldpath 
fading  and  interference  dom  other  mobiles.  In  convendonal 
TDMA  systems,  mobiles  are  assigned  slots  which  they  keep  from 
frame  to  frame.  The  interfering  mobiles  are  assigned  slots  in  the 
same  way.  As  interference  levels  vary  widely  between  slots,  the 
result  is  that  some  mobiles  suffer  fr^  persistendy  poor  SNR. 
Systems  are  generally  designed  for  90%  or  99%  worst  case  condi- 
dons.  Therefore,  the  result  of  this  uneven  interference  distribudon 
is  overly  conservadve  restrictions  on  frequency  re-use  between 
cells,  and  thus  reduced  cr^acity. 

If  instead  the  slot  assignments  were  arranged  such  that  different 
interferers  were  encountered  in  successive  frames  or  slots,  then 
the  worst  case  error  stadsidcs  would  improve,  pardcularly  in 
combinadon  with  channel  coding  across  the  slots  or  frames.  A 
number  of  recent  papers  [1,2,3]  have  proposed  randomizing  the 
interference  with  beneficial  results.  We  provide  qrecific  construc- 
dons  that  lead  to  good  performance  with  low  delay. 

The  allocadon  of  time/frequency  slots  to  different  users  in  the 
same  cell,  and  to  users  in  neighbouring  cells  is  a  combinatorial 
problem  of  some  delicacy.  We  begin  by  considering  a  frame  to  be 
an  n  X  n  array  where  the  rows  correspond  to  frequency  slots,  the 
columns  to  dme  slots,  and  the  array  entries  to  different  users. 
Each  user  in  each  cell  has  an  individual  hopping  pattern,  and  the 
symbol  denodng  that  user  occurs  exacdy  once  in  each  row  and 
column  of  the  frame,  as  for  example  in  frequency-hopped  sys¬ 
tems.  Thus,  it  is  possible  to  accomodate  n  different  users  in  each 
cell.  The  combinatorial  problem  is  to  allocate  hopping  patterns  in 
neighbouring  cells  so  that  two  users  in  these  neighbouring  ceils 
interfere  in  at  most  one  time-frequency  slot  We  show  that  if  n  is  a 
prime  power  then  there  are  n-1  ways  of  allocating  time-frequency 
slots  with  the  desired  interference  properties.  The  constnicdon  is 
based  on  mutually  orthogonal  ladn  squares.  Q)mbinatorial 
designs  associated  with  ladn  squares  lead  to  allocadon  strategies 
for  TDMA  systems  and  for  TDMA  systems  with  frequency  diver¬ 
sity. 

Use  of  these  allocadon  strategies  results  in  independent  interfer¬ 
ence  levels  across  slots,  and  therefore  channel  codes  may  be  used 
to  provide  diversity  protection  against  the  resulting  variations  in 
signal  to  interference  ratio.  If  in  addition  the  slots  are  at  different 
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frequencies,  the  code  will  provide  frequency  diversity  protection 
against  fades  in  the  signal  level.  The  ^vosity  protection  lowers 
the  signal  to  interference  ratio  threshold  required  for  reliable 
operation,  permitting  re-use  of  all  slots  in  neighbouring  cells.  As 
in  CDMA  systems,  interferaice  levels  will  now  directly  dq>end 
on  the  number  of  users,  with  some  back-off  from  the  maximum 
required  for  acceptable  performaitce.  The  reduction  in  the  number 
of  users  implied  by  the  use  of  a  rate  lAn  code,  m  an  integer  does 
not  materially  affect  the  capacity  provided  m  is  teasonaUe,  since 
coding  can  take  the  fmm  of  occupying  m  slots  with  reduced 
power.  The  interference  power  is  unchanged  fmn  the  case  of 
transmitting  at  the  imnunal  power  using  1  in  m  slots.  AiKMher 
benefit  of  coding  is  increased  resistance  to  noise,  and  conse¬ 
quently  the  average  transmitter  power  requirements  are  reduced. 

Compressed  q>eech  presents  a  challenging  channd  coding  prob¬ 
lem.  Delay  is  a  critical  parameter,  witii  the  maximum  acceptable 
delay  on  tiie  order  of  20  to  40  ms.  For  8  kb/s  ^leech  with  a  20  ms 
delay,  there  are  only  160  information  bits.  In  this  time,  reasonable 
frequency  and  int^erer  diversity  must  be  achieved,  along  with 
coding  gain.  We  analyze  a  slow  frequency  hopped  TDMA 
approach  involving  convolutional  codes  and  differential  QPSK. 
For  slots  of  8  to  16  information  bearing  signals,  it  is  very  difficult 
to  estimate  the  signal  to  interference  ratio  (C/I)  in  the  presence  of 
rapid  multipath  fading.  Due  to  the  uncertainty  in  this  estimate, 
soft  decirion  decoding  is  in  some  cases  outperformed  by  hard 
decsion  decoding  combined  with  an  erasure-declaring  mecha¬ 
nism.  Mraeover,  frv  two  antenna  branches,  selection  diversity 
performs  quite  well  relative  to  combining  based  on  C/I.  Our 
results  indicate  that  a  conventional  TDMA  system  with  coding  is 
inferior  to  the  CDMA  system  proposed  in  [4]  at  all  reasonaUe 
outage  probalnlities,  if  re-use  of  all  frequencies  is  attempted  in 
every  cell.  However,  a  slow-hq>ped  system  using  the  method  of 
orthogonal  latin  squares  yields  as  much  as  a  factor  of  2  in 
increased  c^weity  depending  on  tiie  particular  assumptions  made 
about  the  propagation  environment 

Unequal  error  protection  may  be  of  use  with  compressed  qteech, 
since  not  every  bit  has  an  equal  effect  cn  the  perceived  quality  of 
the  reconstructed  speech.  We  present  an  example  of  how  this 
might  be  achieved  with  low  delay  and  miiumal  extra  complexity 
cost  We  also  discuss  coding  in  slow  fading  environments. 
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Abstract 

This  paper  compares  the  performances  of  Direct  Sequence 
Code  Diviaem  Multiple  Access  (CDMA)  aad  Requency 
Hopping  (FH)  schemes  in  a  cellular  multiuser  environment. 
Our  multiuser  channel  model  incmpOTates  the  effects  of 
propagation,  frequency  selective  fading,  and  interference 
among  users  in  the  presence  of  a  constrained  system  band¬ 
width.  The  CDMA  aad  FH  systems  are  compared  using 
BPSK  modulation.  The  main  paint  of  contrast  between 
these  systems  is  that  the  orthogonal  hopping  patterns  in  a 
FH  system  result  in  a  decreased  additive  interference  power, 
however  the  frequency  spreading  nature  of  CDMA  results  in 
the  ability  to  combat  fading.  An  information  theoretic  anal¬ 
ysis  is  presented,  which  shows  that  system  capacity  is  larger 
for  CDMA  than  for  FH.  Hence,  for  this  channel,  with  suf- 
ffdent  coding  the  CDMA  system  can  achieve  a  higher  level 
of  performance  than  the  FH  system.  However,  it  is  undear 
what  level  of  complexity  would  be  required  to  achieve  such 
performsmce,  and  what  effect  such  complexity  would  have 
on  the  practicality  of  the  system. 

Summary 

In  this  paper  we  compare  the  performances  of  direct  se¬ 
quence  Code  Dividon  Multiple  Access  (CDMA)  and  IVe- 
quency  Hopping  (FH)  in  a  cellular  multiuser  environment. 
We  assume  that  there  is  a  fixed  system  bandwidth,  B,  and 
a  fixed  data  rate,  R,  at  which  each  user  communicates.  The 
normalized  traffic  of  a  system,  p,  is  defined  to  be  the  number 
of  users  per  sector,  1V„  divided  ly  the  ratio  of  H  to  H.  We 
find  that  the  FH  system  sees  less  interference  power  than 
the  CDMA  system,  however,  the  FH  system  is  susceptible 
to  frequency  selective  fades  whereas  the  wide  band  nature 
of  CDMA  offers  a  level  of  diversity  to  such  fading.  Thus,  a 
tradeoff  in  performance  exists  and  the  FH  system  performs 
better  at  higher  levels  of  traffic  with  relatively  high  proba¬ 
bility  of  bit  error,  and  the  CDMA  system  performs  better 
at  lower  levels  of  traffic,  vdth  relatively  low  probalnlity  of 
bit  error. 

At  this  point,  we  condder  the  use  of  coding  and  present 
an  information  theoretic  analysis.  Assuming  that  there  is  no 
cooperation  among  the  uaas  in  the  syetem  on  the  level  of 
coding,  the  capacity  of  the  eystem  is  d^foied  to  be  the  largest 
possible  value  of  nonnalized  traffic,  p,  for  which  eadi  user 
in  the  system  can  comimmicate  rdiably  at  rate  R.  The  ca¬ 
pacity  of  FH  and  CDMA  are  computed  and  we  find  that 
the  CDMA  system  has  a  larger  c^>acity.  This  is  due  to  the 
fict  that  the  the  FH  system  does  not  allow  for  the  use  of 


*  This  work  is  supported  by  grants  from  GTE  Laborato¬ 
ries  and  Pacific  Bell,  and  AFOSR  Grant  91-0037. 


very  low  code  rates  because  using  a  low  rate  code  arelativety 
small  number  of  users  would  occupy  the  entire  system  band¬ 
width  and  thus  result  in  a  small  amount  tra&.  It  turns 
out  that,  from  an  information  theoretic  viewpmnt,  that  the 
ability  of  CDMA  to  use  low  rate  codes  is  an  advantage  over 
the  lower  interference  power  in  the  FH  system. 

Since  the  information  themetic  results  are  obtained 
over  arbitrarily  complicated  coding  sdiemes,  and  thus,  ar¬ 
bitrarily  long  driays,  we  investigate  the  performance  of  spe¬ 
cific  coding  schetnes  and  the  effects  of  a  finite,  controlled, 
delay  using  interleaving.  These  results  are  obtained  primar¬ 
ily  through  simulation.  Maldng  some  assumptions  about 
the  vehicle  speed  and  transmitter  frequency,  we  evaluate 
performance  using  a  finite  amount  of  interleaving  delay  by 
taking  into  account  the  actual  amount  of  correlation  anumg 
channel  samples  as  seen  from  one  codeword  symbed  to  the 
next.  We  have  evaluated  the  performance  of  several  rep¬ 
etition  codes  as  wdl  as  an  (8,4)  bi-orthogonal  blodc  code. 
These  coding  sdiemes  perfi^  fer  bdow  information  the¬ 
oretic  capadty,  and  yidd  performance  curves  for  FH  and 
CDMA  that  cross  with  FH  performing  better  hi^er  levels 
of  traffic  with  rdativdy  high  probability  tut  error,  and 
the  CDMA  performing  better  at  lower  levels  of  traffic,  with 
relatively  low  probability  of  bit  error. 
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ABSTRACT 


The  capacity  of  a  coherent  fiequency-hopped  (C-FH)  spread-spectrum 
system  is  investigated.  The  channel  comprises  an  M-ary  ph^-stnft 
keying  (M-PSK)  modulator,  a  fiequency  hopper  with  phak-continuous 
carrier,  the  transmission  medium,  a  nonideal  phase-coherent  frequency 
dehopper,  an  M-PSK  demodulator,  and  a  carrier  traddng  system.  Addi¬ 
tive  white  Gaussian  noise  is  considered.  The  analysis  focuses  on  the 
effect  of  imperfect  recovery  of  the  carrier  phase  on  the  demodulaticm 
process.  The  carrier  tracking  system  includes  a  maximum  likelihood  esti¬ 
mator  of  the  phase  error  and  a  first-order  digital  phase-lock  loop.  The 
phase  error  is  modeled  as  a  Markov  process.  An  expresrion  for  the  state 
transititH)  probabilities  of  the  phase  error  process  is  givett  Using  bounds 
on  the  enmopy  of  a  function  of  a  Maikov  process,  lower  bounds  to  the 
c^racity  of  C-FH  channels  are  derived.  The  input  symbols  are  assumed 
uniformly  distributed  and  the  encoding  process  indqtendem  of  foe  fie¬ 
quency  slot  selected  to  send  each  symbol.  Numeiiul  results  obtained 
for  various  values  of  M  and  of  foe  number  of  fiequency  slots  are  pre¬ 
sented. 

SUMMARY 

In  this  paper,  the  capacity  of  a  fiequency-hopped  spiead-qrectium  com¬ 
munication  system  with  cohereix  demodulation  is  investigated.  It  is 
known  that  large  gains  over  systems  employing  noncoheiem  demodu¬ 
lation  are  attainable  if  trellis  cmled  M-ary  phase-shift  keying  (M-PSK) 
and  coherent  demodulation  with  maximum-liketihood  sequence  detectkm 
are  adopsed  (1).  In  fiequency-hopped  spread-spectrum  sykems,  boweva, 
a  major  obstacle  to  cohetem  demodulation  is  represented  by  the  dlCB- 
cuby  for  the  receiver  to  maintain  phase  coherency  between  foe  carrier  of 
the  incoming  signal  and  a  locally  generated  waveform. 

In  early  work  on  coherent  fiequency-hi^iped  (C-FH)  spread-qiectium 
communication  systems,  ideal  carrier  tracking  was  assumed  [2,3].  In 
more  lecem  work,  it  was  proposed  to  generate  the  C-FH  signal  by 
phase-modulating  the  harmonics  of  a  reference  sinewave  [44].  The 
carrier  tracking  system  recovers  the  phase  of  the  reference  sfoewave  to 
dehop  the  received  signal  coherently,  provided  the  phase  relationships 
between  the  leference  sinewave  and  each  of  its  harmonics  are  knowiL 
The  feasibility  of  such  a  method  has  been  demonstrated  for  low  signal- 
to-noise  ratios  and  binary  PSK  modulation. 

We  extend  the  approach  described  in  [4,3]  to  a  chaiuKl  that  com¬ 
prises  an  M-PSK  modulator,  a  fiequency  hopper  with  phase-continuous 
carrier,  the  transmission  medium,  a  nonideal  pbase-ooheiem  fiequeimy 
dehopper,  an  M-PSK  demodulator,  and  a  carrier  tracking  system.  Addi¬ 
tive  white  Gaussian  noise  is  consideied.  The  carrier  tracing  system 
mcludes  a  maximum-likelihood  estimator  of  the  phase  error  between  the 
transmit  and  receive  reference  sinewaves,  and  a  first-order  digital  phase- 
lock  loop.  The  analysis  focuses  on  foe  effect  of  imperfect  phase  recovery 
on  the  demodulation  process.  Assuming  that  the  frequency  of  the  refer¬ 
ence  sinewave  is  perfectly  known,  it  is  shown  that  the  phase  error  (fon) 
can  be  modeled  as  a  Markov  process.  The  state  transition  probabilities  of 
the  process  (fo^)  are  evaluated. 

We  consider  two  channels:  one  for  which  the  input  to  the  decoder  is 
just  the  channel  output  sequence  I'd  and  one  for  which  in  addition  the 
frequeiKy-bop  pattern  Lf,  is  known  to  the  decoder.  The  capacity  of  the 
channel  which  outputs  foe  sequence  Yn  only  is  given  by 

C  =  (I) 

where  X^"^mpCi,X2 . Xn)  is  the  vector  of  channel  input  symbols, 

y(»>  *  (Yl ,  1'2 . Yd)  is  tte  vector  of  channel  outputs,  and  /(Y^"' :  Y<"^ 

is  the  average  mutual  information  between  foe  random  vectors  and 
Y^"^,  which  can  be  expressed  as  foe  difference  between  foe  entropy  of 
y{n)  an]  the  conditional  entropy  of  Y^”^  given  X^'‘K  i.e., 

,  y(«>)  .  //(y(»>)  _  |  .  (2) 


The  maxinuun  in  (1)  is  over  all  distributians  of  tire  random  vector  X^"K 
We  do  not  allow  this  distribution  to  depend  on  the  random  vector 
i.e.,  the  encoding  process  is  independent  of  the  frequency  slot  used  to 
send  each  symbol.  In  addition,  we  observe  that  each  symbol  is  used  with 
the  same  probability  in  most  codes  of  practical  interest  Thus  we  restrict 
ourselves  to  the  computation  of  the  mutual  information  fin  a  unifbim 
distribution  of  the  input  symbols.  Since  tire  channd  is  not  memoryless, 
capacity  is  not  trivial  to  calculate.  However,  bounds  on  the  entropy  of  a 
random  process  which  is  a  function  of  a  Maikov  process  are  known 
[6,7).  Since  (Yd,  1^,  fod)  *  Maikov  process,  it  follows  that  (Yu)  is  a 
function  of  a  Markov  process.  If  we  let 

H<Y>  =  ^  H(y<"^  .  (3) 

H(r\x)  =  *“  I  ,  (4) 

then  the  following  bounds  can  be  apfdied,  for  n=  1, 2 . 

I  Vl-  - .  Yl.  Yq,  •()•  fo)  S  HOT)  S  H(X„  I  Vi,  _. ,  fj)  ,  (5) 

and 

"(J'»IVi.--.i'i.»(>*o.ioXd.-.Xi)  s  H(y|x> 

S«YdlYd_, . Y,,X,...,Xi),‘^ 

whidi  can  be  computed  by  usii^  the  known  slate  timsitian  probitalities 
of  the  phase  error  process.  If,  in  addition  to  Yd,  Id  is  also  available  to 
the  decoder,  the  crgracity  becomes 

C  =  d  JJw  T  ^ 

Expressions  similar  to  (5)-(6)  can  be  used  to  bound  the  capacity  given 
by  (7). 

Numerical  results  foowing  lower  bounds  to  tire  capacity  of  a  C-FH 
channel  are  preseroed  for  varioos  values  of  M  and  of  the  number  of  fie¬ 
quency  slots. 
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Summary 

For  any  multi-user  communication  system,  the  measure  of  its 
economic  usefulness  is  not  the  maximum  number  of  users 
which  can  be  served  at  one  time,  but  rather  the  peak  load  that 
can  be  supported  with  a  given  quality  and  with  availability  of 
service  as  measured  by  the  bloddng  probability.  This  is  the 
probability  that  a  new  user  will  find  all  channels  busy  and 
hence  be  denied  service,  generally  accompanied  by  a  busy 
signal.  Adequate  service  is  usually  assodat^  with  a  blocking 
p^ability  of  2%  or  less.  The  average  trafiic  load  in  terms  of 
average  niunber  of  users  requesting  service  resulting  in  this 
blocking  probability  is  called  the  Erlang  capacity  of  the 
system. 

In  virtually  all  existing  multi-user  circuit-switched  systems, 
blocking  occurs  when  all  frequency  slots  or  time  slots  have 
been  assigned  to  a  voice  conversaticm  or  message.  In  code 
division  multiple  access  (CDMA)  systems  in  contrast,  users  all 
share  a  common  spectral  frequency  allocation  over  the  time 
that  they  are  active.  HeiKe,  new  users  can  be  accepted  as  long 
as  there  are  receiver-processors  to  service  them,  iirdependent 
of  time  and  frequency  allocations.  We  assume  that  a  sufficient 
number  of  such  processors  are  provided  in  the  common  base 
station  such  that  the  probability  of  a  itew  arrival  finding  them 
all  busy  is  negligible.  Rather,  blocking  in  CDMA  systems  will 
be  defined  to  occur  when  the  interference  level,  due  primarily 
to  other  user  activity,  reaches  a  predetermined  level  above  tlw 
background  noise  level  of  maiiily  thermal  origin.  While  this 
interference-to-noise  ratio  could,  in  principle,  be  made 
arbitrarily  large,  when  the  ratio  exceeds  a  given  level  (about  10 
dB  nominally),  the  interference  increase  per  additional  user 
grows  very  rapidly,  yielding  diminishing  returns  and 
potentially  leading  to  instability.  Consequently,  blocking  in 
CDMA  is  defined  as  the  event  that  the  total  interference-to- 
backgrouiKl  noise  level  exceeds  a  given  threshold  and  we 
determine  the  Erlang  capacity  which  results  in  a  given 
probability  of  this  event  (e.g.  1%).  We  emphasize,  however, 
that  this  is  a  "soft  blocking"  condition,  which  can  be  relaxed  as 
will  be  shown,  as  contrasted  to  the  "hard  blocking"  condition 
wherein  chaimels  are  all  occupied. 


Also,  in  conventional  systems  a  fraction  of  the  time  or 
fixquency  slots  must  be  set  aside  for  users  to  transmit  requests 
for  initiating  service  and  a  protocol  must  be  establish^  for 
multiple  requests  when  two  or  more  users  collide  in 
simultaneously  requesting  service.  In  CDMA  systems  even 
tlw  users  seeking  to  initiate  access  can  share  the  common 
medium.  Of  course,  they  add  to  the  total  interference  and 
hence  lower  the  Erlang  capacity  to  some  degree.  We 
demonstrate  that  this  reduction  is  very  small  for  mitial  access 
requests  whose  signaling  time  is  cm  the  order  of  a  few  percent 
of  the  average  duration  ^  a  call  or  message. 
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We  consider  an  application  of  codes  correcting  errors 
and  erasures  for  decreasing  delay  of  messages  in  networks 
with  datagram  service.  Let  any  message  consists  of  k 
packets  and  the  sender  adds  r  redudancy  packets  in  such 
a  way  that  all  n  =  i-i-r  packets  together  form  a  codeword 
of  some  Q-asy  code  with  minimum  Hamming  distance  d, 
where  Q  =  2”*  and  m  is  the  length  of  any  packet  in  bits. 
Then  the  receiver  can  recover  a  message  immediately  after 
obtaining  the  first  n  —  d  -f  1  packets,  because  he  consid¬ 
ers  the  rest  d  —  1  packets,  which  are  not  yet  arrived,  as 
erasures  and  corrects  them.  In  particulary,  the  receiver 
can  recover  a  message  after  obtaining  the  first  k  packets, 
if  Reed-Solomon  codes  are  used  [1]. 

Denote  by  U  the  delay  of  the  t-th  message  and  as¬ 
sume  that  the  delays  ii,t2, ...,fn  aro  independent  iden- 
dically  distributed  random  variables.  Denote  by  ti:„ 
the  t-th  order  statistic  of  the  sample  (tt,!}, ...,tfi)i  >-e. 
ti.n  <  h:n—  <  tn-.n-  Then  the  delay  of  a  message  equals 
T  =  tk:k  for  ordinary  procedure  and  equals  for 

described  above  procedure  when  R-S  code  of  rate  R  s  k/n 
are  used.The  superscript  R  shows  also  that  one  had  to  re¬ 
calculate  the  packet  delay,  because  the  a  .’crage  customer 
arrival  rate  A  increases  in  l/R  times.  We  suppose  (see 
[2])  that  the  average  packet  delay  equals 

lE[<]  =  a/(l-p),  (1) 

where  p  =  X/p  is  the  utilisation  factor  and  a  is  some 
constant  depending  on  a  given  structure  of  network  and 
fixed  proportions  of  input  flows.  We  also  assume  that  the 
delay  of  any  packet  is  exponentialy  distributed.  It  is  well 
known  that  for  this  distribution  the  average  value  of  the 
ib-th  order  statistic  equals 
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IE[M  =  IE[f]. 

■  sn-t.fl 


Using  this  fact  and  the  assumption  (1)  and  by  putting 
R  =  2p/(l  -f  p),  one  gets  that  the  procedure  of  encoding 
messages  certidnly  gives  a  gain  if 

I  +  p 

21nf^  <  C-l-lnifc, 

1  —  p 

where  C  is  the  Euler  constant.  We  generalise  this  result 
for;  networks  with  ”impatient”  messages;  nonreliable  net¬ 
works;  networks  with  "time-out”  procedure. 
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Abstract  The  performance  of  PCMA  is  analysed  in  environments 
where  the  dominant  sources  of  interference  ate  firstly,  the  multiple 
access  noise  of  other  users  and  secondly,  AWGN.  Ultimately, 
multiple  access  noise  is  the  limiting  factor  in  performance.  The 
intended  transmission  to  the  selected  addressee  is  always  symbol 
synchronous  and  phase  coherent  The  interfeiers  are  observed  in 
one  instance  when  they  ate  symbol  s>  :chronous  and  phase 
coherent  and  again  while  symbol  synchronous  but  noncoherent  in 
phase.  The  symbol  synchronous  phase  coherent  interferers  are 
found  to  represent  the  worst  case  performance. 

Summary.  FCMA  was  first  proposed  by  Stevenson  et  al  [1],  as  a 
new  form  of  multiple  access  for  packet  satellite  communications. 

In  the  proposed  system,  users  share  the  resource  on  a 
nonorthogonal  basis,  as  is  done  in  variants  of  Code  Division 
Multiple  Access,  such  as  Frequency  Hopped  Multiple  Access 
(FH/CDMA)  and  Direct  Tequence  Multiple  Access  (DS/CDMA). 
However  instead  of  user  signatures  occupying  a  time  varying 
narrow  band  (FH/CDMA)  or  the  full  bimdwidth  continuously 
(DS/CDMA),  signatures  consist  of  quasi-orthogonal  combs  of 
frequencies.  For  this  reason  the  scheme  is  termed  Frequency  Comb 
Multiple  Access  (FCMA).  Signature  combs  interleave  one  another 
over  the  available  bandwidth  and  provide  the  basis  for  both 
multiple  access  and  information  transmission.  FCMA  belongs  to 
the  same  family  as  Random  Multiple  Access  (RMA)(2],  with  the 
difference  that  code  signatures  are  restricted  to  combinations  of 
discrete  frequencies  which  are  on  for  the  complete  symbol 
duration.  This  is  in  contrast  to  RMA,  where  signatures  are 
combinations  of  energy  elements,  from  the  (symbol) 
time-(available)  bandwidth  plane. 

A  unique  feature  of  FCMA,  is  the  maimer  in  which  addressing  and 
information  conveyance  is  jointly  accomplished. 

Each  user  has  a  look  up  table  giving  the  signature  sets  of  all  other 
users  in  the  system.  The  transmitter  can  then  be  programmed  to 
use  the  signature  sets  of  any  other  user  in  the  system  and  hence  to 
communicate  with  any  other  user. 

The  selected  addressee  has  a  receiver  tuned  to  its  own  signature 
set  There  is  also  considerable  advantage  in  being  able  to  monitor 
other  channel  transmissions,  as  this  will  determine  if  the  intended 
addressee’s  receiver  is  presently  being  interrogated  and  hence 
reduce  the  possibility  of  collision.  In  FCMA,  this  is  a  significant 
aspect  of  communication,  as  opposing  schemes,  such  as  Aloha, 
experience  collisions  whenever  two  users  share  the  channel 
simultaneously,  whereas  with  FCMA,  this  only  occurs  when  two 

messages  are  simultaneously  directed  to  the  same  user. 

Performance  is  flrst  considered  in  a  multiple  access  noise  only 
environment,  where  there  are  (X-1)  interferers  and  ii  is  assumed 
that  signatures  are  generated  by  randomly  selecting  n  frequencies 
from  an  available  pool  of  M,  the  results  can  then  be  easily  checked 
by  simulation.  Hk  resuldng  BER  when  each  user  has  M=2^ 
signatures  and  thus  conveys  k  bits  per  symbol  is: 

P.  =  V  -L  (py  (1  n)«f^-i)  (1) 

*  2»-i  M  I  y  j  ' 


rri  (-«•(:)  »-iJ' 

Equation  (1)  then  enables  BER  to  be  plotted  versus  both  active 
user  numbers  and  channel  utilisation(x),  where  for  FCMA, 
z  =  Xk/N  (bits  /  hertz). 

When  signatures  with  controlled  overlap  are  used,  for  example 
where  any  two  signatures  have  at  most  one  element  in  common, 
there  is  a  minimum  number  of  simultaneous  users  X  before  errors 
occur  in  the  absence  of  AWGN. 

Performance  in  AWGN  was  also  investigated.  It  was  assumed  that 
maximum  likelihood  detection  was  used  together  with  signatures 
having  at  least  one  common  element  Yates  [3]  and  Wu  [2].  The 
system  was  assumed  to  be  symbol  synchronous  with  all  users’ 
frequency  combs  aligned  in  phase.  The  derivation  is  from  first 
principles  but  is  omitted  here  due  to  space  limitations. 

A  comparison  of  simulation  and  analytic  results  for  the  above  case 
shows  close  agreement  and  establishes  an  upper  bound  to 
performance.  The  claim  that  this  is  an  upper  bound  is  justified  by 
the  fact  that  inphase  interferers  give  worst  erosion  of  distance 
between  symbols. 

Simulation  results  indicate  that  noncoherent  interfeiers  have  a 
significant  performance  margin  over  coherent  interferers.  This  is 
an  intuitively  satisfying  result,  as  noncoherent  interferers  would 
generally  occur  in  practise.  Because  of  the  distance  properties  of 
the  signature  set  chosen,  the  system  is  power  limited(AWGN 
dominated)  when  X  <  (n+l).  When  X  >=  (n+/),  there  is  a  value 
of  at  which  the  system  becomes  dominated  by  multiple 
access  noise.  For  n=3  a  value  of  E,/N„  >  15  dB  was  required  and 
for  n=5  a  value  of  >  19  dB. 

Applications  for  PCMA  include:  satellite  multiple  access  in  which 
the  traffic  is  bursty  and  a  guaranteed  level  of  performance  is 
required,  ie  low  data  rate  VSATS;  control  channel  for  DAMA; 
emergency  maritime  communications:  networks  in  which  the  active 
user  population  is  a  fraction  of  a  much  larger  potential  user 
population;  mobile  communications;  indoor  wireless;  environments 
requiring  frequency  diversity;  delay  intolerant  systems  eg  speech; 
inquiry  response  traffic  using  short  packets. 

As  signauires  convey  both  address  and  data  information,  there  are 
no  address  overheads  and  very  short  packets  are  viable,  since  rapid 
acquisition  can  be  achieved  using  a  short  preamble  provided  an 
FFT  receiver  is  used.  System  design  involves  choices  for  three 
parameters  N,  n,  and  k.  Considerable  flexibility  is  therefore 
available  to  meet  particular  traffic  requirements  and  constraints. 
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vol.2.  Computer  Science  Press,  1985. 
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Abstract 

The  performance  of  hybrid  random- access  sys¬ 
tems  (HRAS)  which  use  a  combination  of  multi¬ 
user  codes  (MUC)  and  collision  resolution  algorithms 
(CRA)  to  accomodate  the  bursty  transmissions  of 
many  independent  users  on  a  single  communica¬ 
tion  channel  is  analysed.  Besides  the  computation 
of  the  system  throughput  ,  another  contribution 
of  this  paper  is  the  determination  of  the  proper¬ 
ties  which  a  MUC  must  possess  such  that  resulting 
HRAS  performs  better  than  a  system  which  is  only 
based  on  CRA’s. 

I.  Introduction 

We  aim  at  combining  the  collision  resolution(CR)  and  the 
Multi-user  information  theory  (MUIT)  approaches  [1]  by 
using  a  Hybrid  Random-Access  System  (HRAS)  which  uses 
both  a  collision  resolution  algorithm  (CRA),e.g.,  as  de¬ 
scribed  in  [2]  ,  and  a  multi-user  code  (MUC),  e.g.,  as  de¬ 
scribed  in  [3].  The  intutive  underlying  idea  is  that  small 
collisions  among  T  or  less  users  (e.g.,  T  =  2  or  T  =  3) 
are  "resolved”  by  using  MUC’s  and  large  collisions  (which 
are  assumed  to  occur  very  infrequently)  are  resolved  by  us¬ 
ing  CRA’s.  A  notable  improvement  of  the  HRAS  over  one 
which  uses  only  CRA’s  occurs  if  the  roundtrip  delay  is  large. 
Even  though  the  delay  performance  is  of  primary  interest 
when  adding  coding  to  a  CRA,  throughput  analysis  is  nec¬ 
essary  and  yields  some  initial  results  about  how  much  gain 
coding  might  offer. 

II.  Blocked  and  Free-Access  HRAS’s 

The  analysis  was  done  for  the  Basic  Blocked  and  Free- 
access  channel  access  protocols(CAP)  given  in  [2]  using 

the  capacity  of  the  T  user  real  adder  channel  [4]  as  an 
upper  bound  for  the  rate  of  {Nc,T]  codes  (a  T  active 
out  of  Nc]  and  the  codes  given  in  [3j  as  a  lower  bound. 
Fig(l)  and  Fig(2)  show  the  maximum  stable  throughput  for 
HRAS’s(7.3)  ^irne'  and  lower  bounds)  together  with 
the  maximum  stable  throughput  for  the  uncoded  R.\S  s 
^cni'2]  versus  Q  (after  a  collision,  each  transmitter  involved 
flips  a  fair  "(J-sided  coin”)  for  blocked  and  free-access  sys¬ 
tems  respectively. 

III.  Conclusions 

1.  Maximum  stable  throughput  can  be  substantialy  im¬ 
proved  (e  g.,  with  Q  =3  for  free-access  RAS  X„i,  =  0  4016, 
whereas  for  free-access  HRAS(7,3)  =  0.664  ). 


2.  T  =  3  gives  the  best  performance  out  of  all  {Nc,T) 
codes  for  both  Blocked  smd  Free  -Access  RAS’s  [2].  Thus, 
the  search  for  HRAS’s  can  be  considerably  narrowed,  since 
large  T  offers  no  advantage  at  all. 

3.  The  upper  bound  of  {Nc,  T)  codes  [4]tends  to  get  loose  as 
Nc  diverges  &om  T  ,  which  might  suggest  that  the  capacity 
of  {Nc,T)  codes  should  be  a  function  of  Nc-  Note  that  for 
practical  reasons  large  Nc  is  not  desirable  in  otu  case  . 
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1.0  Introduction 

Freqnen^  hoppiog  ia  one  of  the  common  techniq^  for 
■preadinsihe  atjpial  apectnim  in  digital  data  communication 
vyatema.  The  am^t  a  frequency  apread  k  far  more  than  the 
minimum  handaridth  neceaaary  to  tranamit  the  digital  data 
Thk  fact  makea  it  feanUe  for  many  uiera  to  dure  a  cmnmon 
channel.  Thk  piH;>er  k  concerned  with  conatruction  of  new 
frunihea  of  dow  frequency  hopfnn^  pattema  dnived  from 
aequencea  over  a  given  aemi-locd  reaidue  claaa  polymunid  ring 

(QF(p)[(l/*(0:  Set  of  polynomiak  over  QF(p) 

nudulo  w(()),  where  w(()  k  a  polynomid  of  dwee  n  over 
OF(p). 

The  frequent  hbrary  in  a  alow  frequency  hopping 
apread  apectnim  (SFHSS)  aydem  conakta  of  large  numhv  m 
frequency  carrien  which  are  choaen  to  he  orthogonal  to  each 
othtf  over  the  tranamiaaion  time  duratiim  T.  Tlu  carrieia  are 
obtained  by  aubdividing  the  entire  bandwidth  into  contiguoua 
frequency  dota.  F<w  multiple-acceaa  purpoaea,  each  uaer  k 
provided  with  a  diatinct  hoiqk^  pattm  of  period  L.  Each 
aymbol  in  a  hoK>il^|:  pattern  k  mwn  from  t^  frequmw 
library  and  detenninea  the  frequency  band  within  which 
tranamiaaion  takea  place. 

Correlation  requirements  on  the  patterns: 


Normally  in  frequency  hopping  ayatema,  it  k  required 
that  mutud  Hamming  corrd^on  between  aequencea  abould  be 
amail.  In  SFHSS  qratana,  one  or  more  aymbok  are  tranamitted 
within  one  frequency  hop  (dot)  and  a  hit  would  mean  total  Iona 
of  data  tranainittM  in  that  hop  12,3].  Thua,  apart  from 


ininimiain^;  mutual  Hamming  correutioo  between  pattema, 
faita  reaulting  from  preaence  of  all  the  pattema  in  the  ayatem 
ahould  be  minimiaed.  Thk  prompt  ua  to  dafin*  geneiidked 
Hamming  corrdatian  functiona  which  depend  on  all  t^ 
aequencea  in  a  family,  unlike  Hamming  correlationa  which 
depend  on  only  on  two  aequencea. 

Let  S*,  m  s  l,.,n,  be  n  aequencea  of  len^  L  over 
certain  dphabet  ^  then  the  generalked  Hamming  croaa- 
corrdation  function  concerning  m"  aequence  k  given  by 

QHC(rj,Tj...,r^j,r^^,,..,T^)  =  E  gh{S-.;  Si.^^  /or  aU  jN»). 

1  ■O  J 

where  ^  k  a  function  given  by 

rfrf*.  b  b  b  W  1  ^ 

The  cocreaponding  autocorrelation  function  k  given  by. 

OHAJr,,rj,.  ..,g=^  .for  all  j) 

For  proper  muHi-ueer  operationa,  pattema  for  SFHSS 
ayatem  ahould  have  ideal  OHC  propertiea  (Croaacorrelation 
function  k  equd  to  aero  for  all  vaiuea  of  r.). 

2.0  Main  Results 

The  frequency  hopping  pattema  are  obtained  by 
aaaoriating  wiUi  each  aymbol  a  in  the  ring  P^tvKOli  •  diatinct 
frequency  f,  hdonging  to  frequency  Hbrary.  Propertiea  of 
orthogonal  ideala  of  P^ar(()]  mkI  the  intenul  direct  aum 

rapreaentatioo  of  the  ring  P^«(f)]  have  bean  made  uae  of  to 


define  frequency  hopping  pattema.  Let  ar(()  s  Wj(0 
where  w^(()  and  w^(0  are  rdatively  prime  polynomiak  of 

degreea  n^  and  Uj  reflectively;  n  s  n^+  n^.  Then  P^ar(0]  can 
be  repreaented  aa  internal  direct  aum  of  ideak  komoridiic  to 
ringi  PjHwjCOl  and  and  ‘jCO  »>• 

orthogonal  idempotent  polynomiak  in  I^w(()]  correaponding 

to  ringi  and  P**{w2(())].  Then  dementa  of  the 

ideak  <  e^^)  >  and  <  ej({f  >  mutually  annihilate  each  other 
[4].  Thua  dementa  of  the  coaeta  of  the  ideal  <  >  in 

I^w(()]  are  all  diatinct.  Sequencm  are  defined  in  auch  a  way 

that  dementa  of  a  aequence  bdong  to  a  dktinct  coaet.  Since 
theee  coaeta  are  mutually  excluaive  (there  k  no  common 
dement  among  theae  coaeta),  ided  generalked  Hamming 
corrdatimi  {vopertiea  follow  naturally.  Conatraction  of  alow 
bopiang  aequencea  makea  uae  of  <»timd  frequency  hopping 
aequencea  over  locd  ringi  derived  in  Til.  Following  new  frmulka 
aredetived. 

1.  A  family  of  p*^  aequencea  of  period  L  s  p*^l  over 

wHere  nsn^+n,,  by  namg  a  aequence  over  P^H*|(0J 

of  period  p**-l,  where  Wj(()  k  aa  irreducible  factor  (of  degree 

ni)  of  w({).  Theae  aequencea  aatkfy  ideal  generalked  Hammh^ 
corrdation  propertiea. 

2.  A  bmily  of  ap**  aequencea  of  period  L  =  p**— 1  over 
I^[w],  nsn^^-n^,  by  uaing  p  one-coincideace  aequencea  over 

P^'[Wj(0]  each  of  period  p**— 1.  The  generalked  Hamming 

croaa  and  auto  corrdatione  for  any  aequence  in  the  family  are 
given  by 

OHOC^(rj,Tj,...,T^)  <  p-1  for  all  r.  #  t„ 

A  code  generakon  achenae,  baaed  on  the  direct  aum 
decompoaition  m  aami-local  lii^,  for  alow  hnroing  multiple 
acceaa  communication  ayatema  k  given  vdiere  <£ffmnt  uaera 
can  have  different  frequoicy  diveraity. 
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BOUNDS  ON  THE  CAPACITY  OF  AN  AWGN  CHANNEL  WITH 
INTERTRANSITION  CONSTRAINED  BIPOLAR  INPUTS 
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Department  of  Electrical  Engineering 
Technion  ■  Israel  Institute  of  Technology,  Haifa  32000,  Israel 


We  present  lower  and  upper  bounds  on  the  capacity  of  an  AWGN 
channd,  the  input  to  which  is  a  bipolar  (±1)  waveform  with  a  con¬ 
straint  that  the  minimum  duration  between  transition  is  no  shorter 
than  Tmin- 

This  model  is  used  to  characterize  certain  magnetic  recording  chan¬ 
nels  where  bipolar  signaling  is  preferred  due  to  the  hysteresis  phe¬ 
nomenon  of  the  magnetic  media  and  the  minimal  intertransition  durar 
tion  constraint  is  imposed  as  to  mitigate  the  heavy  (possibly  nonlinear) 
intersymbol  interference  effects. 

The  upper  bounds  are  based  on  Duncan’s  formula  that  Interrelates 
the  average  mutual  information  to  the  average  minimum  mean-square 
error  (MMSE)  of  the  causal  optimal  estimator.  To  this  end  the  MMSE 
of  snboptimal  Unear  and  nonUnear  estimators  is  studied  (1]  and  the 
guard-time  random  telegraph  signal  [2]  is  also  considered. 

The  lower  bounds  are  found  by  considering  bipolar  runlength  Um- 
ited  (d,oo)  codes  where  d  (integer)  is  related  to  the  minimal  intertran¬ 
sition  constraint  by  d  =  TnUa/A  and  where  A  stands  for  the  duration 
of  the  bipolar  channel  symbol.  The  asymptotic  (d  oo)  expression 
for  the  entropy  of  the  max-entropic  (d,  oo)  bipolar  sequences  is  invoked 
along  with  a  recent  extension  of  Mrs.  Gerber’s  Lemma  [3]  (to  account 
for  any  binary  input-output  symmetric  channel)  to  yield  the  lower 
bounds,  which  are  optimized  with  respect  to  d  >  1  (or  equivalently 
A  =  Tmin/d).  Pulse  ampUtude  and  pulse  width  modulated  signals  are 
also  considered  in  the  context  of  lower  bounding  the  capacity  [1]. 

It  is  concluded  that  the  capacity  behaves  as  l/lVo  (nats/sec)  for 
SNR=  Tmin/Ao  <  1  and  as  r,;i'„ln  (nats/sec)  for  > 

1,  where  No  denotes  the  power  spectral  density  of  the  AWGN. 

Lower  bounds  on  the  capacity  with  the  aforementioned  constrained 
inputs  in  the  presence  of  a  mildly  band-Umited  (in  scales  of  T^)  chan¬ 
nel  filter  are  presented.  ExpUcit  expressions  are  found  by  generalizing 
the  recently  introduced  Shamm-Ozarow-Wyner  lower  bound  on  the 
capacity  of  a  dispersive  discrete-time  Gaussian  channel  with  iid  in¬ 
puts  (4],  to  account  for  dependen,  inputs  and  incorporating  in  the 
generalization  a  convexity  property  which  is  implied  by  the  extended 
Mrs.  Gerber’s  Lemma. 
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Abstract 


We  consider  a  frequency  non-selective  slowly  time-varying  Rayleigh 
fading  code-division  multiaccess  (CDMA)  additive  white  Gaussian 
noise  (AWGN)  channel.  Assuming  that  the  signature  waveforms  are 
time-limited  to  the  symbol  interval,  we  find  the  capacity  region  of 
the  two-user  symbol-synchronous  channel.  If  the  signature  waveforms 
span  several  symbol  intervals,  we  have  to  further  assume  that  the  fad¬ 
ing  process  is  constant  over  the  duration  of  every  codeword.  In  that 
case,  we  find  the  capacity  and  the  optimal  input  power  spectral  den¬ 
sity  (PSD)  in  a  parametric  expression  similar  to  that  in  the  classical 
water-filling  argument. 

Summary 


We  consider  a  frequency  non-selective  slowiy  time-varying  Rayleigh 
fading  CDMA  AWGN  channel 


!»(*)  =  51  ~  +  AjjOj(l)si(t  -  iT)  -t-  n(t),  (1) 


where  the  signature  waveforms  of  the  users,  si(t)  and  S2(t),  are  unit- 
energy  functions  strictly  time-limited  to  [0,LT]  for  some  finite  i,  and 
n(t)  is  the  zero-mean  complex  white  Gaussian  noise  with  independent 
real  and  imaginary  parts,  each  with  power  spectral  density  Aq  (i.e., 
En(t)n*(f-r)  =  2NoS{t).)  The  power  constraints  on  the  users  require 
that  every  length-n  codeword  of  the  user  has  average  power  at 
most  equal  to  W^T.  We  assume  that  the  channel  is  a  slowly  time- 
varying  channel  in  the  sense  that  o*(t)  =  Oki  for  t  6  [tr,(i  -I-  l)r],  k  = 
1,2,  and  {an}  and  {ajj}  are  two  independent  zero-mean  stationary 
m-dependent  complex  Gaussian  fading  processes  having  independent 
real  and  imaginary  parts.  The  autocorrelation  function  of  the  fading 
process  {ajfeJ  is  denoted  by  alRk{i)  where  al  is  the  power  of  {o*,} 
(i.e.,  R*(0)  =  1.)  Finally,  we  assume  that  the  receiver  has  complete 
knowledge  of  the  fading  processes,  but  the  transmitter  knows  only  the 
statistics  of  the  fading  processes. 

The  corresponding  single-user  channel  with  s(t)  6  £3(0, T]  (i.e., 
£  =  1)  is  equivalent  to  the  discrete-time  frequency  non-selective 
Rayleigh  fading  AWGN  channel  Yi  =  OiXi  +  Ni.  We  have  omitted 
the  subscripts  for  the  users  in  the  single-user  case.  Since  the  receiver 
knows  the  fading  parameters,  the  channel  is  equivalent  to  a  stationary 
channel  with  output  (Vi,a,)  whose  capacity  is  [1] 


1C(A)  =  ^Elog 


1-f 


WTR 
2No  . 


where  the  expectation  is  taken  over  R  =  [rql’  which  is  exponentially 
distributed  with  mean  (t’,  and  A  =  WTc'‘H2No)  is  the  average  re¬ 
ceived  signal-to- noise  ratio.  The  function  Ei()  is  the  exponential- 
integral  function  and  the  last  expression  follows  from  (2.  p.  574]. 

We  extend  the  above  single-user  result  in  two  directions:  (1)  the 
capacity  region  of  the  two-user  channel  with  s*(t)  6  £3(0, T),  and  (2) 
the  capacity  of  the  single-user  channel  with  s(t)  e  £3(0,  £r|  and  very 
slowly  time- varying  fading  process. 

When  the  signature  waveform  are  time-limited  to  the  symbol  in¬ 
terval,  the  CDMA  channel  reduce  to  a  discrete-lime  frequency  non- 
selective  fading  multiaccess  channel,  Y;  =  HA.X,  -t-  N,,  where 
A  =  diag[a„03,],  EN^N"  =  Hli.j,  X,  =  [.V„A'2.f,  and  H  is  the 
crosscorrelation  matrix  of  the  signature  waveforms.  The  capacity  re. 
gion  of  this  channel  is  given  by  tlie  following  theorem. 


Theorem  1 

The  capacity  region  of  the  two- user  multiaccess  channel  in  (1)  with 
e  £2(0, r]  is  the  set  of  all  (/f,,fi2)  6  satisfying 

ft|<^C(A,),  ft3<^C’(''2), 


Rx  +  Ri<  5;C(Ai) 

-Ieexp  ( 


1  +  -Rj 


Ai(l  +  R2(l-p2)) 


1  -t-  fis 


(1 -I- R3(1  -  p’)) 


)■ 


where  A*  =  WkTa\l2No  is  the  average  received  signal-to-noise  ratio  of 
user  k,  and  p  is  the  crosscorrdation  of  the  signature  waveforms.  The 
expectation  is  taken  over  Rj  =  |a3,'p  which  is  exponentially  distributed 
with  mean  tr^.  O 


In  the  special  case  when  the  signature  waveforms  are  identical,  the 
capcaity  re^pon  reduces  to  the  following  expression. 


Corollary  1 

If  the  signature  waveforms  are  identical  (i.e.,  p  =  1),  the  capacity 
region  becomes  the  set  of  all  (Rj,  /I3)  6  1R+  satisfying 

Ri  <  ^C(Ai),  R3  <  ^C(A3). 

f  i>.gLM-A2C(A2)  ifA.5iA3, 

ifA.  =  A3  =  A. 

□ 


When  the  signature  waveforms  of  the  users  span  over  several  sym¬ 
bol  intervals  (i.e.,  s(f)  6  [0,£T]),  the  single-user  channel  becomes 
Yj  =  where  kj  =  R,(jT)  and  R,{t)  is  the 

autocorrelation  function  of  s(t).  Even  in  the  single-user  case,  the  ca¬ 
pacity  of  this  channel  is  known  only  in  a  limiting  expression.  However, 
if  we  assume  that  the  channel  is  very  slowly  time-varying  so  that  the 
fading  process  is  a  random  constant  over  the  duration  of  any  code¬ 
word,  the  capacity  of  the  single-user  channel  and  the  optimal  input 
PSD  can  be  obtained  in  a  parametric  expression  similar  to  that  in  the 
classical  water-filling  argument. 

Theorem  2 

The  capacity  of  the  frequency  non-selective  very  slowly  time- 
varying  Rayleigh  fading  single-user  channel  is  equal  to 

c  =  f  £  c{P[f)r{f))df 

where  P{f)  is  the  solution  of 

P{f)T{f)  =  F-*([cr(/)-ir), 

A  =  pP{f)df. 

Jo 

In  the  above  equations,  T{f)  =  l‘S’{{/  -  f^)/T)\^^ 

^  “  i_E.xp(i)Ei(-i) 

A  is  the  average  received  signal-to-noise  ratio,  and  S{f)  is  the  spectrum 
of  the  signature  waveform.  □ 

This  result  can  be  viewed  as  a  generalization  of  the  water-filling 
result  to  the  fading  channel  since  if  F{x)  =  i  and  C(x)  =  log[l-f  ij.  the 
above  characterization  reduces  to  the  classcial  water-filling  argument. 
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THE  CHANNEL  CAPACITY  IN  THE  PRESENCE  OF  IMPULSE  NOISE 
Kenneth!.  Kerpez 

Bellcore,  Morristown,  NJ  07960-1910 


Impulse  noise  is  bursty,  high  amplitude,  low  {KobabUity  nt^. 
Impulse  noise  often  occurs  from  man-made  disttubances.  Impulse 
noise  is  not  well  understood  because  it  is  not  Gaussian.  However, 
impulse  ndse  is  a  significant  impairment  for  digital  transmission. 
This  paper  analyzes  impulse  noise  by  infmmatioo  theccetic 
calculations.  In  particular,  the  channel  capacity  in  the  presence  of 
impulse  noise  is  bounded  and  computed. 

The  signal  and  noise  are  assumed  to  be  band-limited  and  sampled 
at  the  Nyquist  rate.  Impulses  are  assumed  to  have  independent 
Poisson  arrivals.  Each  impulse  noise  sample,  n,  is  modeled  by  the 
probability  density  function 

/s(ii)  =  (l-X)5(n)  +  >./v(n) 


and  p  and  Oy  are  file  mean  and  variance  of  file  inqxilse  hm|^t 
density,  fv{n).  It  was  found  that  any  of  the  three  lower  bounds 
may  be  file  tightest,  depending  on  die  parameters.  The  tightest 
lower  bound  to  the  differential  entropy,  A  (AOt  is  denoted  as  LB. 

The  bounds  on  the  differential  entropy  woe  used  to  bound  the 
channel  capacity.  Upper  bounds  to  the  capacity  were  found  by 
using  file  data  processing  inequality  for  Markov  chains  and  the 
differential  enm^  of  Gaussian  noise.  The  lower  bound  to 
ciqiacity  was  found  by  using  the  entrc^  power  inequality.  It  was 
shown  that 

-|-ln[2jt«P  -I-  e^]  -UB<  Capacity  <  -i-lnPTKfP  +  oj)]  -  LB 


where  X  is  the  arrival  rate  of  impulses,  S(r)  is  the  Dirac  delta 
function,  and  /v(n)  is  the  "impulse  height"  density.  Assume  that 
/v(n)  is  a  continuous  function  in  a  neighbodiood  about  n  =0. 
Using  the  theory  of  generalized  functions  it  was  shown  fiiat  if  the 
only  source  of  noise  is  impulse  noise,  then  the  channel  capacity  is 
unbounded.  Thus,  later  results  assume  that  there  is  always  sane 
additive  white  noise. 


Denote  the  probability  density  of  the  additive  white  noise  as  Jw(n). 
Then  the  density  of  the  sum  of  impulse  noise  and  additive  white 
noise  is 


fnip)  =  (1  ~  X)fw{n)  +  X/v+nr(n) 


with  A'+w('»)=/v(r)  Upper  and  lower  bounds  were 

deriv^  for  the  differential  entropy,  h(N),  of  the  impulse  noise 
plus  additive  white  noise.  First,  A(N)  was  upper  bounded  by  the 
differential  entropy  of  Gaussian  noise  with  the  same  variance, 
A(iV)  S  (1  /2)ln[2jteoJ]  =  UB,  where  oj  is  the  variance  of  the  sum 
of  impulse  noise  and  white  noise.  A  second  upper  bound  to  A  (N) 
was  derived  by  using  the  non-negativity  of  the  Kullback-Leibler 
distance,  A  third  upper  bound  was  derived  by 

applying  the  concavity  (tf  the  logarithm  to  the  definition  of  A(N). 
It  was  proven  that  the  first  upper  bound  is  always  the  tightest 


Assuming  that  file  additive  white  noise  is  Gaussian,  three  lower 
bounds  to  A  (N)  were  found.  The  first  lower  bound  was  found  with 
the  entropy  power  inequality,  it  is:  (1  /  2)ln[27Ko2,]  S  A  (N),  where 
Ow  is  the  variance  of  the  additive  white  noise.  The  second  and 
fiiird  lower  bounds  were  found  by  bounding  the  integral  expression 
for  the  differential  entropy,  they  are: 

1  _  ILl^  _  -  2X(1  -X)  p s  A(N) 

V^Ow  +  o?)  +  oj) 


and 

l-^|ln[2«]-■^^^- 

2 


V2(oii,  +  oJ) 


Here, 

_ 

-  2(oi  +  c»3) 


2oi+oJ 


1 


2X(1  -  X) 
■yjlalf  +  al 


PSA(N). 


where  P  is  the  received  signal  power.  The  capacity  was  also  upper 
bounded  by  the  capacity  of  a  channel  with  only  additive  «bite 
Gaussian  noise  and  no  impulse  noise. 

The  hyperbolic  probability  density  accurately  models  the  heights, 
fvin),  of  in^xilses  observ^  on  local  copper  telephone  loops.  The 
hyperbolic  density  is:  /v(n)  =  C/|  |  if  VL  S  |  n  |  S  VH,  and 

/v(n) »  0  otherwise;  where  C  is  a  constant.  The  capacity  bounds 
were  evaluated  and  compared  for  both  Gaussian  and  hyperbolic 
impulse  heights.  It  was  found  that  there  is  slightly  less  channel 
c^Micity  with  Gaussian  impulse  heights  than  with  hyperbolic 
impulse  heights. 

The  capacity  bounds  were  computed  and  plotted  for  a  variety  of 
signal  and  noise  powers.  It  was  found  that  the  bounds  on  channel 
capacity  are  tight  if  the  power  of  file  additive  white  Gaussian  noise 
is  large,  or  if  the  impulse  arrival  rate,  X,  is  small.  Most 
significantly,  it  was  found  fiiat  the  capacity  with  additive  Gaussian 
white  noise  and  impulse  noise  is  practical  the  same  as  it  is  wifii 
just  additive  white  Gaussian  noise  and  no  impulse  noise,  as  long  as 
the  arrival  rate  of  impulses  is  less  than  about  1  impulse  every  100 
samples.  For  parametera  typical  of  high  speed  digital  transmission 
on  telephone  lot^,  the  difference  in  capacity  with  and  wifiiout 
impulse  noise  is  less  fiian  4  thousandths  c£  a  pocenL  In  general, 
for  most  transmission  parameters,  impulse  noise  has  almost  no 
effect  on  channel  capacity,  even  if  file  additive  white  Gaussian 
noise  power  is  small 

Formulas  were  derived  for  the  bit  error  rate  of  an  uncoded 
transmission  system  in  the  presence  of  impulse  noise  and  additive 
white  Gaussian  noise.  The  bit  error  rate  was  plotted  and  compared 
to  the  charmel  capacity.  It  was  shown  that  impulse  noise  can 
greatly  increase  file  bit  error  rate,  but  have  almost  no  effect  on 
capacity.  For  parameters  typical  of  teleffiione  loops,  the  gap 
between  the  capacity  and  the  uncoded  bit  error  rate  in  the  presence 
of  impulse  noise  is  a  hefty  40.5  dB,  at  a  KT^  bit  error  rate.  This 
gap  can  be  decreased  to  9  dB  or  less  by  using  interleaved  linear 
block  codes  to  mitigate  impulse  noise  errors,  provided  that  the 
resulting  delay  can  be  tolerat^ 
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Abstract 

The  ideal  Poisson  channel  with  noiseless  feedback  modeb  a  direct 
detection  photon  channel,  free  of  dark  cnrrent  (Ao  =  0),  in  which  a 
cansal  feedback  link  informs  the  transmitter  at  every  time  t  of  the 
channel  ontpnt  at  all  times  prior  to  t.  The  paper  discusses  the  cod¬ 
ing  for  the  channel,  under  peak  power  and  average  power  coiutrmnts 
on  the  input.  A  coding  scheme  for  the  chaimel  is  presented,  and  its 
asymptotic  error  exponent  u  shown  to  coindde,  at  all  rates  below  ca¬ 
pacity,  with  the  Sphere  Packing  error  exponent,  which,  for  the  case  at 
hand,  b  known  to  be  unachievable  without  feedback  for  rates  below 
the  critical  rate.  An  upper  bound  on  the  error  exponent  achievable 
with  feedback  is  also  derived.  It  b  shown  that  under  a  capadty  re¬ 
ducing  average  power  constraint,  the  upper  bound  coincides  with  the 
error  exponent  achieved  by  the  proposed  coding  scheme;  consequently, 
in  such  a  case  the  coding  scheme  b  asymptotically  optimal.  Thus,  the 
ideal  Poisson  chaimel,  limited  by  a  capacity-redndng  average  power 
constraint,  provides  a  non-trivial  example  of  a  channel  for  which  the 
Reliability  Function  b  known  exactly  both  with  and  without  feedback. 
While  our  main  concern  is  fixed  transmission  time  coding  schemes, 
the  subject  of  random  transmission  times  b  also  briefly  discussed;  it 
b  shown  that  a  slight  modification  of  the  coding  scheme  to  one  of  ran¬ 
dom  transmission  time  can  achieve  zero-error  probability  for  any  rate 
lower  than  the  ordinary  average-error  channel  capadty. 
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A  DIRECT  GEOHETRICAL  METHM)  FOR  BOUNDING  THE  ERROR  EXPONENT  FOR  SPECIFIC 
FAMILIES  OF  CHANNEL  CODES  - 

PART  II;  THE  CONFINING  REGION  LOWER  BOUND  FOR  BLOCK  CODES 
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Faculty  of  Technical  Sciences 
Conputer  Science,  Control  and  Measureaents  Institute 
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The  Introduction  of  a  confining  region  that  divides  the 
channel  output  space  In  two  disjoint  parts  attains  the 
saiae  effect  as  the  Gal  lager' a  exponent,  but  gives  such 
sore  Insight  Into  the  behaviour  of  channel  codes.  More¬ 
over,  the  bounds  obtained  are  alvAys  tight  at  low  coda 
rates  (for  optlsal  codes  they  are  always  tight). 


dggytS.R)  -  sin  (flg  :  Ejtdg.R)  *  »  .  O) 

the  gotoff  rate  lower  bound  [1]  on  the  expurgated  fasl- 
ly  9  Is  reduced  to  dBGV^*'  R)  at  all  code  rates  less 

than 


Introduction 

The  channel  error  exponent  (reliability  function) 
Is  defined  as 

E(R)  -  1^  {--rl‘*[Peopt<''''‘^]}  •  W(x).log2(x),  (1) 

tdiere  R  Is  the  code  rate,  P  .(R.N)  la  the  ssallest 

possible  probability  of  block  decoding  error  for  codes 
of  code  rate  R  and  length  N  used  on  the  given  channel. 

In  Part  I  of  this  paper,  the  randoa  coding  argu- 
sent  usually  used  In  lower-bouixilng  the  channel  error 
exponent  was  discarded  In  favour  of  the  one  that  uses 
the  known  expected  Bhattacharyya  distance  distribution 
of  a  code  faslly.  If  the  code  faslly  distance  distribu¬ 
tion  Is  known,  the  error  exponent  obtained  pertains  to 
this  specific  code  faslly  used  on  the  given  transsissi- 
on  channel,  and  not  to  the  channel  Itself.  The  code 
family  that  attains  the  channel  error  exponent  la  the 
optlsal  one,  and  Its  Bhattacharyya  distance  distributi¬ 
on  the  optlsal  distance  distribution. 


The  general  expression  for  E(R) 

The  distance  distribution  sethod  In  Its  final  fors 
gives  the  lower  bound  on  the  code  family  error  exponent 
In  the  form 

E(R)j  -  min  |Ej(dg,R)  ♦  E^{ig,R.S)|  -  R.  (2) 

where  3  denotes  the  code  family,  is  the  norsallzed 
(by  N)  Bhattacharyya  distance,  and  and 

minimum  and  maxlsum  norsallzed  Bhattacharyya  distances 
In  the  code.  E^ldg-R)  Is  the  expected  norsallzed  Bhat¬ 
tacharyya  distance  distribution  exponent  of  9,  and 
E(s^,R,  9)  Is  the  lower  bound  on  the  error  effect  expo¬ 
nent  (see  II]  for  precise  definition  of  these  expo¬ 
nents). 


'*crltl9*  “  ^9  “  ^B9eff  ' 

Here.  R^j  Is  the  code  family  cutoff  rate,  and  Is 

the  norsallzed  Bhattacharyya  distance  of  those  code¬ 
words  that  are  the  only  ones  that  Influence  the  code 
faslly  cutoff  rate  bound.  At  low  code  rates  (not  grea¬ 
ter  than  R  th®  cutoff  rate  bound  Is  tight. 


The  confining  reidon 

The  channel  output  space  say  be  partitioned  in 
two  using  the  confining  region  defined  for  the  s’th 
codeword  as 


cs‘"^(Cb)  - 


P[ylxJ 

iSJE  (Plylxl)  *  ^ 
s.aS’^(Xs.Cg) 


,  a  -  1 . M  ,  (5) 

%rtiere  ■  (a*’'*)  ■  (asl  :  dg(a^.m)  >  Cg)  is 

the  exterior  of  the  Bhattacharyya  confining  sphere  In 
the  available  encoding  space  Z,  centered  at  the  actual¬ 
ly  transsltted  codeword  and  whose  radius  Is  Cg  (ex¬ 
pressed  In  the  norsallzed  Bhattacharyya  distance,  that 
Is  defined  in  Z,  but  not  In  Y^).  Upper-bounding  the 
probability  of  error  inside  CS^^^lCg)  by  the  ustn.1  uni¬ 
on  bound,  and  outside  it  by  the  sere  probability  that 
ydCS^’^^lCg),  one  obtains  a  significant  laprovesent.  For 

the  code  families  uniformly  distributed  over  Z,  this 
procedure  yields  the  (^Hager’s  lower  bound  on  the 
cheumel  error  exponent,  implying  that  these  codes  are 
optimal.  Moreover,  the  error  exponent  of  these  families 
Is  known  at  all  code  rates,  since  It  is  known  below 
R^riti9*  coincides  with  the  space-partitioning 

upper  botjnd  on  the  channel  error  exponent  at  high  code 

rates. 


B^Cgrtncta 


The  code  family  Gllbert-Varshamov  distance 

If  a  codeword  Is  expurgated  fros  each  pair  of  co¬ 
dewords  In  the  code  that  Is  less  than  the  code  family 
G1  lbert-Vai*shamov  distance,  defined  as 


[1]  D.  E.  LazlC,  V.  Senk,  'A  Direct  Geosetrlcal  Method 
for  Bounding  the  Error  Exponent  for  Any  Specific 
Family  of  Channel  Codes  -  I^t  I:  Cutoff  Rate  Lower 
Bound  for  Block  Codes*,  IEEE  Trtns.  Info.  Th. ,  Vol. 
38,  pp.  1548-1559,  Septesber  1992. 
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UNIVERSAL  DECODING  FOR  MEMORYLESS  GAUSSIAN  CHANNELS 
WITH  A  DETERMINISTIC  INTERFERENCE 
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Abstract  moooiooicaBy  n 

In  [1]  univenal  decoding  schemes  for  finite-atphabet,  finite-sute  channels  t,M'^-»Oaa/t- 
woe  pnpt^  and  shown  10  be  optimal  in  the  sense  of  aoaining  the  highest  possi-  the  function 
Me  random  coding  enor  exponent,  when  the  channel  input  vecton  are  chosen  ran¬ 
domly  under  a  unifbnn  probability  distribution.  In  other  words,  the  avenge  error 
probability  over  the  ensemUe  of  randomly  selected  codewords,  decays  at  the 
fastest  eiqionential  rate.  In  the  special  case  of  DMC'a,  the  pnponed  universal 
decoder  selects  a  codeword  that  minimizes  the  empirical  conditional  entropy  of  among  all  itf  cod 

the  channel  input  given  the  channel  outpuL 

We  derive  an  analogous  result  for  memoryless  Gaussian  channels  with  an  Theorem;  Aswm 

unknown  deterministic  interference  from  a  fairly  wide  class.  The  empirical  condi-  functions  with  at 

tional  entropy  of  the  input  given  the  output  is  induced  by  an  auxiliary  backward  ^  Gaussian  ] 
channel  whoM  parameters  are  estimated  from  the  given  output  vector  and  each 
one  of  the  codewords.  We  also  allow  a  more  general  class  input  distributions 
by  slightly  modifying  the  decoding  rule.  Similarly  to  [1],  it  is  shown  that  die  pro¬ 
posed  universal  decoder  attains  the  same  error  exponent  as  that  of  the  ML  decoder  where  am 
wtiichisfulty  informed  of  the  channel  and  the  interfering  signal  ^ 

The  proposed  decoder  is  different  from  an  heuristic  approach  (2],  where  the  fun 

chaniKl  and  message  are  jointly  esiimaled  by  the  ML  meth^  While  the  former 
decoder  is  based  on  the  backward  channel  as  mentioned  earlier,  the  latter  _ 

corresponds  to  the jbnwird channel.  For  the  simple  special  case  where  there  is  no  vAiat 
inmference  and  the  only  uncertainty  is  in  the  channel  fading  parameter,  it  is  The  proof  qipean 
demonstrated  that  the  error  exponent  of  the  proposed  rule  might  bt  strictly  belter  jntuith 

than  that  of  the  joint  ML  channel-and-message  estimation  approach.  >n<nirir>i 


roonotonicaDy  nondecteasing  inieger-valned  sequence  satisfying  and 

as !!-»««.  Our  decoding  rale  win  select  a  message  s'  that  maximixea 

the  function 


«(«*>) 


maxWCx*  ly,eAi) 

A  S 


among  all  ilf  codebook  messages. 

Theorem:  Assume  that  }  can  be  eiqianded  to  a  series  of  bounded  orthonormal 
functions  with  an  absoliaely  tummaUe  coefBciem  seipieace  (h,- ),. a i-  Lnt  q(z) 
be  any  Gaussian  FIX' of  the  form 

q(x)  « -  Z  ® 

where  Ox,  nr,  and  Yi.  Ym  *'n^P*™neiers  10  be  chosen.  Then, 

lim  —logF,j,(q  Ji  ,n)=  ]iia  —togF,^{q  Jt  Ji),  (4) 

ii-H*  n 


Summary 

Consider  a  discrete-time.  Gaussian  memoryless  channel  characterized  by 
y,  i:az,  -fz,  tv,,  where  X,  is  the  desired  channel  inpat.  a  is  an  unknown  fad- 
iiu  parameter,  tv,  is  zero  mean  Gaussian  while  noise  with  an  unknown  variance 
<r.  z,  is  an  unknown  deterministic  interference,  and  y,  is  the  channel  ouqxu.  We 
assume  that  z,  can  be  represented  by  a  series  of  given  ortbonannal  bounded  func¬ 
tions,  le.,  z,  where  1  and  I  2  f,  for  all  f  and 

f,0<L  <«>. 

Consider  next,  a  codebook  C  =  {x',x^,...,x*'l  ofM  >2**  eqttiprob- 

able  messages  x*  =(x'|,X2, . . .  ,x,' . R*.  f  s  1,2 . M,  where  R 

is  the  coding  rue  in  bits  per  channel  use.  Clearly,  if  the  parameter  d  and  the 
interference  signal  z,  are  known,  the  best  is  the  ML  decoder,  which  in  the  Gaus¬ 
sian  caae  considered  here,  selects  the  message  x*  that  minimizes 
^*_](y,  -  z,  -  ox,^.  Similarly  to  [1],  the  probability  of  error  asiociattid  with  the 
ML  decoder  will  be  denoted  by  ^  (C  ,A , A ). 

Since  the  design  of  a  codebook  C  that  minimizes /’,^(C,R,n}  under  an 
input  power  conattaim  is  prohibitedly  complex  for  large  n,  will  shall  adopt  the 
rmdom  coding  approach,  where  each  codeword  is  randomly  chosen  with  respect 
to  some  probability  dens^  functioq.(PEHO  f  (x),  independemly  of  all  other  code¬ 
words.  It  is  wdl  known  [3]  that  where  the 

expectation  is  taken  over  ensemble  of  randomly  select  codebooks  under  q, 
decays  exponentially  for  every  R  <R(q),  where  R(q)  is  a  rate  depending  on  q 
and  always  less  ll^  the  channel  capacity.  The  exponential  rate  of  the  error  pro- 
babiUty  E(qJ{)  =  -lim^^~'logF,^(qJt^)  is  called  the  raiuhm  coding 
error  exponent. 


where  (q  ,R  ,A  )  is  the  average  error  probability  associated  witii  (2). 

The  proof  appears  in  [4]. 

The  intuitive  inlespreiation  of  (2)  is  that  log  i<(xjr)  can  be  thought  of  as  an 
empirical  version  of  the  mimial  infianmtion  between  x  and  y.  Thus,  we  select  the 
input  z'  that  seems  empirically  “most  dependent"  upon  the  given  output  vector 
y.  This  cotretpoiKls  to  the  maximum  mutiial  information  (MMI)  decoding  princi- 
ple.  It  should  be  pointed  out  that  if  (z, )  is  known  to  be  composed  from  k  <  a> 
basis  functions  {gj2}>if>4<'f*B<‘‘eorem  applies  with  Lt'k  ineq.CZ)aadm  Sk 
ineq.(3). 

Ideally,  one  wishes  to  choose  q(  )  so  as  to  maxiiiiize  E(qjt).  However, 
since  the  maximiziag  FDFq(')  depends  on  the  unknown  chaniKl.  there  is  no  way 
by  which  the  uansmitier  can  optim^  sdea  q  (•)  unlesa  there  is  a  feedback  chan¬ 
nel  from  the  receiver  10  the  transminer.  Ihe  choice  of  an  input  PW  of  the  form 
of  eq.  (3)  can  be  also  motivated  by  the  fsct  that  the  capacity  of  the  Gaussian  chan¬ 
nel  is  stained  by  a  Gaussian  Pl^. 

It  turns  out  that  the  extensioa  of  the  above  theorem  to  nonmematyless  chan¬ 
nels  is  not  trivial.  Consider,  for  example,  a  Oauisian  channel  with  a  linear  iniR- 

symbol  interference  (ISI).  characterized  by  y,  +H',.where 

is  the  channel  impulse  response  and  w,  is  a  Oansuan  white  noise.  The  Afftcuky 
ippeen  to  be  in  an  apprapriaie  definition  of  the  auxiliary  backward  channeL  We 
conjecture  that  an  appropriate  definition  of  the  backward  channel  in  this  case  win 
be 


W(xly,eA)-C,(eAor)n» 


i-l  i-0  I 


where  0  >  (O^.a, . Os  A>-  ■  •  ■  >  Ps)  *^1  a  a  nonnalizalion  fac¬ 

tor  chosen  such  that  the  above  PW  wiU  integraie  to  unity. 

References 


If  the  fading  parameter  o  and  the  interfering  signal  (z, }  are  unknown,  then 
the  ML  decoder  is  obviously  inapplicable.  We  next  demonstrate  a  decoding  pro¬ 
cedure  which  is  universal  in  the  sense  of  being  independent  of  d  and  (z,  ],airi  at 
the  same  time  attaining  E(qji).  In  ocher  words,  let  P,j,iCJtji)  denote  the 
error  probability  associated  with  the  universal  rule  for  a  given  codebook  C.  and 
(« .'f  .")*  { F.  (C  Jf  ,n  )).  Then.  P,  ^  (q  J1  ,n  )  decays  exponentirily 

at  the  same  rate  E{q  jt)  as  that  associated  with  the  ML  decoder.  This  is  analo¬ 
gous  to  an  earher  result  bg  Ziv  [1]  for  finite-alphabet,  finite-stale  channels. 

We  now  turn  to  present  dm  proposed  decoding  rale.  To  this  end,  define  an 
auxiliary  backward  channel  of  ordm  k  by  the  conditional  PDF 

W(xly,eA)  =  (2*a^r'*nexp(--^(x,  -cty,  -  0) 

(-1  [  2atf  i.\  } 

where  x«(x, . x,),  y-fy, . y«)  and  0^(o^.cx,Pi,P2 . Ps)  •» 

the  parameter  vector  of  die  kih  order  backward  channel.  Let  (k, }.  j  i  be  any 


(1]  J.  Ziv.  "Univetsal  Decoding  for  Fntile-Siate  Channels."  IEEE  Trans. 
Inform.  Theory.  Vol.  IT-31,  No.  4,  pp.  453-460.  July  1985. 

(2]  N.Seshadri.  "Joint  Data  and  Channel  Estimation  usinf  Blind  Tlmllis  Search 
Techniques,”  submitted  to  IEEE  Trans.  Commsi. 

(3]  R.  G.  Gallager,  Information  Theory  and  KeUabU  Conrnmnicaaon.  New 
York,  Wiley  1968. 

[41  N.  Mertiav,  "Univeisal  Decoding  for  Memorylesi  Oanasian  Charmeis  with 
a  >Jeierministic  Interference."  submitted  for  pubUcation. 


265 
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Consider  the  reliable  transmission  of  information  over  a  discrete¬ 
time  memoryless  channel  with  mismatched  decoding,  i.e.,  where  the 
decoding  metric  is  not  necessarily  matched  to  the  channel’s  charac¬ 
teristics.  This  is  a  realistic  model  for  time-varying  channels  or  when 
implementation  constraints  dictate  a  ^ven  decoder  which  employs  a 
specific  fixed  metric. 

Hui  [1]  has  derived  a  lower  bound  on  the  capacity  of  a  discrete 
memoryless  channel  (DMC)  with  mismatched  decoding,  hereafter  re¬ 
ferred  to  as  Hni’s  capacity  and  denoted  Ch-  Our  first  result  in  this 
work  is  an  extension  of  this  lower  bound  to  an  exponential  family 
of  channels.  This  wider  class  of  channels  includes,  as  special  cases, 
DMC’s,  finite-state  channels,  Poisson  channels  and  Gaussian  channels. 
Some  of  the  results  extend  to  exponential  channels  with  memory  (e.g., 
finite-state  channels),  but  in  this  case  a  single-letter  characterization 
of  the  achievable  rates  is  not  available. 

Motivated  by  the  matched  decoding  case,  we  prove  that  in  the  ran¬ 
dom  coding  regime,  Hui’s  capacity  is  the  highest  achievable  rate  under 
mismatched  decoding.  This  observation,  as  well  as  a  sphere  packing 
argument  for  bounding  the  maximum  possible  number  of  disjoint  mis¬ 
matched  decoding  spheres,  support  Hui’s  conjecture  (recently  proved 
by  Balakitsky  (2)  for  binary-input  channels)  that  Ch  is  the  ultimate 
reliably  transmitted  rate.  New  bounds  and  interesting  properties  of 
Ch  are  presented  [3],  and  relations  among  Ch,  the  generalized  average 
mutual  information  (defined  in  terms  of  Gallager’s  bound  in  parallel 
to  the  matched  case)  and  the  generalized  cut-off  rate  are  established. 

Some  indicative  examples  of  practical  interest  for  continuous  and 
discrete-alphabet  memoryless  channeb  with  various  mismatched  met¬ 
rics  are  worked  out.  In  particular,  a  two-dimensional  AWGN  channel 
(with  Gaussian  inputs)  subjected  to  a  phase  offset  of  9  is  considered. 
It  is  found  that  the  deleterious  effect  of  the  phase  offset  9  on  Ch  man¬ 
ifests  itself  in  attenuating  the  signal  power  by  a  factor  of  cos^  9  and  in 
adding  an  equivalent  noise  term  with  power  of  sin^  9  times  the  signal 
power.  This  expression  mimics  the  behavior  of  the  uncoded  complex 
channel  with  a  phase  offset. 


We  proceed  to  examine  specific  exsimples  of  encoding/decoding 
mechanisms  motivated  by  the  nature  of  the  mismatch.  It  b  demon¬ 
strated  that  the  achievable  reliable  transmitted  rate  under  mismatched 
decoding  may  depend  on  the  performance  criterion  (bit  error  vs.  mes¬ 
sage  error  probability)  and  on  the  coding  strategy  (randomized  vs. 
deterministic),  in  contrast  to  the  wdl-known  behavior  of  the  opti¬ 
mal  matched-decoding  scenario.  As  an  example,  consider  a  BSC  with 
crossover  probability  of  p  <  0.5,  where  the  decoder  uses  the  mismatch 
metric  adapted  to  a  BSC  with  j/  >  0.5  instead  of  p.  In  this  case 
Ch  =  0  [1],  however,  by  using  a  variant  of  differential  encoding  one 
can  achieve  a  positive  rate  with  respect  to  the  bit  error  probability 
(while  the  message  error  probability  goes  to  unity).  Moreover,  a  ran¬ 
domized  strategy  (e.g.  assigning  to  any  possible  message,  with  equal 
probability,  a  properly  selected  binary  codeword  or  its  complement) 
leads  to  a  positive  achievable  rate  with  respect  to  the  message  error 
probability.  Thus,  in  several  specific  cases,  with  different  error  criteria 
and/or  randomized  coding  strategies,  reliabb  rates  exceeding  Ch  are 
achievable. 
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Aljstract 

A  new  Markov  chain  approach  to  the  evaluation  of  the  B:ame 
ent»'  probability  for  the  M-Algorithm  is  presented.  Using  this 
nxxlel  results  for  values  of  M=1  to  64  and  frame  length  of  L=64  to 
S12  bits  have  been  evaluated  for  a  convolutional  code  of  memory 
length  v=19  and  rate  R=l/2.  Simulation  results  are  compared  to  the 
Markovian  model  showing  that  the  technique  is  attractive  for  the 
performance  evaluation  of  suboptimal  decoding  algorithms  for  con¬ 
volutional  codes. 

Summary 

Suboptimal  decoding  algorithms  in  general  and  the  M-Algo- 
rithm  in  particular  have  received  a  significant  amount  of  interest 
lately  [1-S].  These  algorithms  are  used  to  search  large  trellises 
where  an  exhaustive  exploration  is  impractical.  Their  suboptimal 
search  is  heuristically  guided  to  minimize  the  number  of  paths  to  be 
explored  in  order  to  achieve  a  reasonable  bit  error  performance. 
However  theoretical  analysis  of  these  heuristics  is  complex  and  few 
theoritical  mandhods  are  available  fm  precise  evaluation  on  the 
error  performance  of  these  algorithms. 

In  order  to  establish  an  upper  bound  on  the  error  performance  of 
the  M-Algorithm.  the  minimum  number  of  path  extensions  required 
to  include  the  correct  path  at  each  tree  depth  must  be  determined. 
This  problem  has  elud^  analysis. 

In  this  paper  we  present  a  new  approach  to  the  evaluation  of  the 
tirame  error  performance  of  the  M-Algorithm  over  a  binary  sym¬ 
metric  channel  and  additive  white  gaussian  noise.  It  is  based  on  a 
Markovian  description  of  the  decoding  dynamics  of  the  M-Algo¬ 
rithm  and  uses  the  column  weight  distribution  of  the  code  [6].  The 
column  weight  distribution  of  a  particular  convolutional  code  rep¬ 
resents  the  number  of  paths  with  a  certain  Hamming  weight  at  each 
particular  depth  in  the  decoding  tree. 

The  Markov  chain  consists  of  an  “Irutial”  state,  a  “Lost”  state 
and  a  varying  number  of  intermediate  states.  The  “Initial”  state  rep>- 
resents  the  decoder  behaviour  when  the  channel  is  error-free,  typi¬ 
cally  at  the  beginning  of  the  frame  or  when  the  channel  has  b^n 
error-free  for  a  sufficiently  long  period  of  time.  The  “Lost”  state  is 
an  absorbent  state  which  represents  an  error  propagation  event  due 
to  the  loss  of  the  correct  path  and  its  ensuing  lack  of  resynchroniza- 
tion.  An  intermediate  state  represents  the  accumulated  channel  tran¬ 
sitions  at  a  given  inconect  subset  depth  since  departure  from  the 
“Initial”  state.  The  transitions  between  the  states  of  the  Markov 
chain  are  a  combination  of  the  correct  path  loss  probability  and  the 
channel  transition  probability. 

Using  this  model  a  transition  matrix  >4  may  be  constructed.  For 
a  frame  of  length  L,  the  frame  error  probability  is  then  given  by  the 
transition  probability  from  the  "Initial”  state  to  the  “Lost”  state  in 

the  matrix  A^. 
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This  technique  has  been  applied  for  a  rate  R  =  ^  convolutional 

code  of  memory  v  =  19  with  frame  lengths  varying  from  L=64  to 
S12  bits  and  number  of  paths  varying  from  M=1  to  64.  Results 
show  that  as  a  first  order  approximation,  a  sliding  vitiiKiow  decoder 
[7]  is  a  good  approximation  if  M,  the  number  of  paths  to  be 
explored  is  sm^.  However  if  the  number  of  explmed  paths 
increases,  then  the  number  of  incorrect  subsets  to  consider  must 
also  increase,  making  the  sliding  window  inadequate. 

Using  this  technique  a  good  upper  bound  on  the  frame  error 
performance  of  the  M-Algorithm  can  be  calculated  far  a  given  code 
and  some  value  of  M.  Once  the  frame  error  probability  is  Imown, 
the  bit  error  performance  for  the  M-Algorithm  can  be  approximated 

by  using  simple  arguments  [5].  Assuming  that  Af «  2'’  then  the 
probability  of  resynchronization  tends  towards  0.  It  has  been 
observed  through  simulations  that  on  the  average  the  decoder  will 
loose  the  correct  path  in  the  middle  of  the  frame  and  that  one  half  of 
the  decoded  bits  in  the  erroneous  pention  of  the  frame  will  be  in 

error,  leading  to  an  error  event  of  ^  bits  per  erroneously  decoded 

frame.  A  good  approximation  to  the  bit  error  probability  is  then 

given  by  7*^  =  =  -^Pf.  This  approximation  is  supported  by 

extensive  simulation  results. 

In  summary,  the  new  Markovian  based  frame  error  probability 
analysis  for  the  M-Algorithm  and  a  binary  symmetric  chaiuiel  will 
be  presented.  A  comparison  between  simulation  results  aixl  numer¬ 
ical  results  shows  that  when  the  number  of  explored  path  increases 
then  the  number  of  incorrect  subsets  in  the  Maikov  model  should 
also  increase,  increasing  with  it  the  complexity  of  the  evaluation  of 

A^.  However  the  extraction  of  the  frame  error  probability  remains 
trivial  making  the  technique  attractive  fm  evaluating  the  error  per¬ 
formance  of  suboptimal  decoding  algorithms  for  convolutional 
codes. 
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Summary — In  this  paper  we  show  that  systematic 
convolutioned  encoders  perform  as  well  as  nonsystematic 
ones  when  they  are  used  together  with  M-algorithm  de¬ 
coders  [1].  We  describe  the  algorithm  and  give  a  brief 
historic2tl  review.  The  following  curves  show  simulation 
results  for  the  event  error  probability  of  the  Af -algorithm. 
We  compare  an  optimum  distance  profile  nonsystem¬ 
atic  encoder  (dyr«e  =  22)  and  a  quick-look-in  encoder 
(d/ree  =  18)  with  a  systematic  encoder  {djrtt  —  13).  All 
encoders  have  memory  m  =  20  and  in  the  decoder  32 
states  are  extended  at  every  time  instant  (M  =  32). 

Error  Event  Probability 


All  curves  show  the  same  event  error  probability  per¬ 
formance. 

Using  criteria  for  the  encoder  quality  like  the  optimum 
distance  profile  and  the  optimum  profile  spectrum  [2]  any 
encoder  is  equivalent  to  a  systematic  one  over  the  encoder 
memory.  As  long  as  these  distance  criteria  support  the 
decoder  perlormance,  the  event  error  probability  of  the 
M-algorithm  depends  only  on  M.  Therefore,  in  a  range 
of  interesting  values  of  M ,  systematic  encoders  should  be¬ 
have  like  nonsystematic  encoders  in  terms  of  error  event 
probability  as  our  simulations  show.  The  decoder  com¬ 
plexity  is  independent  of  the  memory  of  the  encoder.  The 
free  distance  does  not  matter  as  long  it  is  big  enough 
to  correct  all  paths  within  the  set  of  extended  decoder 
states.  Hence,  for  A/-algorithm  decoders,  in  contrast  to 
the  Viterbi  decoder,  code  quality  canno^  be  expressed  in 


terms  of  the  free  distance  and  the  distance  spectrum.  A 
rapid  growth  of  the  column  distances  is  more  important 
than  a  large  free  distance. 

As  a  bonus  when  used  together  with  the  M-aigorithm, 
systematic  encoders  outperform  nonsystematic  encoders 
in  terms  of  bit  error  probability  as  shown  in  the  next  pic¬ 
ture  (framelength  =  1024).  The  reason  is  that  systematic 
encoders  are  superior  from  a  correct  path  loss  point  of 
view. 


Bit  Error  Probability 

iiiMiory  20.  liU32 
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Summary — We  analyse  a  list  decoding  algorithm  [1]  (A/- 
algorithm  [2])  for  binary  rate  R  =  6/c  convolutional  codes.  In 
every  decoding  step,  starting  from  [***^^J  +  1  where  [.J  de¬ 
notes  the  integer  part,  the  decoder  selects  the  L  most  likely 
code  sequences  and  calculates  their  successors.  This  procedure 
is  continued  until  the  decoder  reaches  the  end  of  the  tree  or 
the  trellis. 

We  study  the  distance  properties  and  the  probabilistic  perfor¬ 
mances  of  the  algorithm.  We  introduce  a  natural  extension 
to  the  free  distance  of  the  convolutional  code,  viz.,  the  L-list 
minimal  distance  di.  The  L-list  decoder  corrects  all  combi¬ 
nations  of  [  J  or  less  errors.  Using  computer  search  we 
found  convolutional  encoders  having  maximal  L-list  minimal 
distance. 

Analogously  to  the  Costello  bound  we  derive  the  following 
lower  bound  on  di  for  rate  R=:  h/c  binary  convolutional  codes: 
There  exists  a  time-invariant  convolutionai  code  such  that 
(bound  1) 

-  log,(2>-«-l) 

where  <p{R)  does  not  dependent  on  L. 

For  rate  R  =  1/2  this  can  be  strengthend  further  (bound  2): 


-logj(v^-  1) 


logj(\/2 - 


The  bound  for  rate  R  =  1/2  can  be  tightened  if  we  choose 
and  fix  the  first  i  -f- 1  matrices  G*  ,  ib  =  0, 1 . . .  i  (bound  3): 

d  >  -  2’  -fl)  -H  -  logj  T[o.,i(\/2  -  1)  _ 

-log,(v/2-l) 


packing  type  lower  bounds.  The  random  coding  upper  bound 


E[P(£,)] 


<.  { r-: 
1 1-* 


*Gi(l),  R  <  Rcompi 

*02(1)  log}  L,  R  Rcampi 

^03(1),  R  >  Rcomp, 


where  g  is  the  solution  of  the  equation  qR  =  G{g),  G{g)  is  the 
Gallager  function  for  the  BSC,  and  Oi(l),  i  =  1,2,3,  are  val¬ 
ues  depending  on  R  and  e  but  not  on  L.  Using  the  expurgation 
bound  we  get; 


P(f.)  <  L">*«t>-^-‘)0(l),  R<Rcomp. 

where  0(1)  is  a  value  depending  on  R  and  e,  but  not  on  L. 

The  derivation  of  the  lower  bound  for  the  L-list  decoding 
error  probability  is  based  on  the  corresponding  lower  bound 
for  block  codes.  For  a  given  L  and  R  there  exists  an  integer  < 
such  that  the  L-Iist  decoding  error  probability  on  the  t-th  step 
satisfies  the  inequality 


P{£t)  > 


— OO^Syt.  Encotfirni*3f 
•w-—  bountfs 

- bountf9.lwS 


f  / 
.•  *7  /  y 


where  u;/f(v)  is  the  Hamming  weight  of  v,  is  the  weight  enu¬ 
merator  of  the  t-th  truncation  of  the  code. 

Based  on  a  Hamming  type  upperbound  for  L-list  minimal 
distance  of  block  codes  we  prove  the  following  upperbound  for 
df. 

A.  ^  — +o(logjl,). 


log,(2>-«-l) 


£i!2ll^._0.forl-.oo 

logjL 

Finally,  we  derive  a  random  coding  type  upper  bound  on 
the  decoding  error  probability  P(£t)  of  the  L-list  decoding 
algorithm  on  the  <-th  step  and  an  expurgation  type  and  sphere 


Lower  bounds  for  du  compared  with  dt  for  the  systematic 
ODP  code  with  m  =  29. 
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SUMMARY 

'DtUe  iook-up  beecd  decoding  eckcmea  for  convolutionnUy  *ncAded 
karc  been  designed  for  both  block  coding  [&]t  and  convolntioanl 
[8,  3].  However,  in  the  latter  cac  :  most  of  the  work  was  done  for  systematic 
^AMicr  Nonsystcmatic  codes  o.fer  better  error  correcting  capafaiEty  than 
systematic  ci^cs  if  more  than  one  constraint  length  of  reedved  blocks  are 
considered  [7].  Onr  reproach  to  table  look-np  based  decoding  of  nonsys- 
tematic  conTdntioaal  codes  was  introdneed  in  [1,  S], 

A  1/3-rate  convolutional  encoder  is  characterised  by  two  generator  se¬ 
quences  gW)  =  g!/^,  j  =  1, where  v  is  the  eenstrsent  length 

of  the  code,  Le.,  the  number  of  memory  elements  in  a  iwisiisl  realisatioa 
of  the  convolutional  code  [4].  The  output  constrerat  length  is  deCaed  as 

=  3(i/  -t- 1)  and  is  equal  to  the  number  of  encoded  bits  aSected  by  a  sin¬ 
gle  input  information  bit  [7].  An  input  information  sequence  u  is  encoded 
into  two  encoded  output  sequences  tU>,  j  =  1,3,  using  v  =  uG,  where 
▼  =  “  Ike  composite  encoded  sequence,  also  called 

a  codeword,  obtained  by  multiplexing  the  two  encoded  sequences,  and  G 
is  the  semi-iniinit!'  code  generator  matrix  [7]. 

The  encoded  bits  in  1/3-rate  coding  are  generated  at  twice  the  input 
information  rate.  However,  it  is  possible  to  find  an  encoding  operation  that 
rdates  blocks  of  input  information  bits  to  equal  length  blocks  of  encoded 
bits  [1,  S], 

Pxopositioiii  For  1/3-rate  convolutional  coding  with  constraint  length 
Vf  there  exists  a  correspondence  between  equal  length  blocks  of  input  infor¬ 
mation  lata  and  the  encoded  bits.  The  length  of  these  corresponding  blocks 
is  3i/  bits.  (For  proof,  see  [1,  3].) 

We  can  formalises  this  relationship  as  follows.  Let  [n]i,<4.]._i=(ui, 
•i+ti  Ur+iv-i)  be  a  3v-bit  block  of  the  input  information  sequence, 
and  (▼]u,M+i«-i=(vit,  »n+i,  ... ,  vm.).]„_i]  be  the  3t>-bit  block  of  the  cor¬ 
responding  encoded  sequence.  Given  the  generator  sequences  g^l,  j  ~  1,3, 
for  a  1/3-rate  convolutional  code  with  constraint  length  if,  we  define  the 
rtdmetd  eneoduig  motnsas 


eI*’ 

eI’-'i  eI‘> 

eI’’ 

[G]i»  = 

•r’ 

;  Ei‘> 

Ei*> 

eI'’  . 

.  E<‘-\ 

Ei’-’i 

eJ*’ 

Ei'» 

(If 


where  blanks  denote  seros.  A  3i^bit  input  information  block  is  encoded  to 
obtain  the  corresponding  encoded  block  of  length  2if  bits  as  foQows. 


—  [u](,t4.j,_i[G]j,.  (3) 

For  1/3-rate  convolutional  coding  with  constraint  length  if,  an  encoded 
sequence  obtained  by  equation  (2)  can  be  uniquely  decoded  if  and  only  if 
the  inverse  of  the  reduced  encoding  matrix  exists,  i.e.,  [G]],[G]i,^  = 

[!]>■>,  where  [Ijrv  is  the  2if  x  2if  identity  matrix.  A  3r^bit  input  information 
block  is  recovered  by  decoding  the  corresponding  3v-bit  encoded  block  as 
follows: 

["ksv+t-i  =  [v]w,i»+w_i[G]i,J.  (J) 

It  is  desirable  to  ftnd  nonsystematic  codes  that  can  be  decoded  qui  kly 
(e.g.  Qnick-Look-h  codes  [7]).  The  decoding  operation  spedfled  by  equa¬ 
tion  (3)  can  be  implemented  as  a  look-np  UUe  (or  can  be  built  into  the 
decoder  hardware),  and  leads  to  the  definition  of  a  new  class  of  coavo- 
htional  codes  that  allow  last  decoding.  We  define  1/3-ratc  coavolutioaal 
codes  with  constraint  length  if  that  have  an  invertible  ledueed  — 
tux  [Gjs,,  as  raverfiUs  codtt.  Note  that  both  systematic  and  nonsystematic 
codes  can  belong  to  the  class  of  invertible  codes. 

The  received  sequence  r  can  be  expressed  as  r  =  v  -f  e,  where  a  is  the 
error  sequence.  The  syndrome  sequence  s  is  data  independent  and  can  be 
ealfulnted  one  bit  at  a  time  as  (dlows; 


*<  =<  Ww.n+j^+i,  h  >,  (4) 

where  <  •,  •  >  indicates  the  vector  dot  product,  [e]M,)i+Sv+i  is  a  -I-  3)- 
bit  block  of  the  error  sequence,  and  h  is  (i>  l)th  ceduma  of  the  syudrome 
former  H**  described  in  (4].  Since  each  received  bit  affects  (I'-l- 1)  syndrome 
bits,  it  is  snllleient  to  eoaridet  a  finite  length  syndrome  sequence  denoted 
by  [s]r,<.|.v  and  called  the  s-sddiwss.  In  order  to  fully  compute  4v-f-3 

eneodM  bits  arc  needed  [1].  From  equation  (4), 

[■]r,l+r  =  (*) 


where  [H*'](4v4.s)x(v.fi)  >*  a  mbmatrix  of  the  syndrome  former  H',  start¬ 
ing  with  the  (» -f  l)ih  eolamn,  and  [e]^M.Hx4.i  is  a  (bf  -f  3)-bit  block  of 
the  error  sequence.  Each  [e]si,M+4v.ft  >■  XB  error  pattern  of  Snitc  length 
emespondiag  to  the  s-nddrtM  [s]<,>+v. 

Equation  (6)  provides  the  ban  for  tanloHlrivea  correction  of  errors. 
Dutiag  the  correction  table  generation  phase,  error  pat'.ems  of  length  at 
least  (4>'  -I-  3)  hats  are  considered.  Tbe  s-addreases  are  ralrulated  for  the 
most  Hkdy  error  patterns.  Typically,  we  irst  eonsid^r  sia^  bit  errors. 
Next,  two-bit  errom  arc  consideicd,  etc.  The  bits  in  error  can  be  located  in 
any  of  the  (4v-|-  3)  positions.  The  bit  pair  (ej2Isx>  *M+Sx+i)  participates  in 
the  cslenlatioa  of  every  bit  of  the  s-address  This  pair  is  called  the 

coirectson  pear,  and  is  the  location  where  correction  will  be  applied.  The 
correction  information  for  the  correction  pair  is  denoted  by  (e^*),  ci’i),  and 
is  stored  in  the  table  at  address  [s]4,r.f,.  The  table  generation  continues  for 
higher  order  errors  until  all  the  entries  in  the  table  are  tiled. 

Since  there  arc  more  error  bits  than  syndrome  Ints  in  equation  (S), 
multiple  error  patterns  of  length  (4i/-|-3)  can  generate  the  same  (v -I-  l)-fait 
s-address.  AH  such  patterns  are  sy»droine.i«di«ii»gnixhalJe  [4]  (Le.  there 
is  a  conHict).  The  sise  of  the  correction  table  can  be  increa^  so  that 
conflicting  error  patterns  can  map  to  different  (longer)  s-addreases.  This 
is  known  as  the  syndrome  extension  method  [1,  3].  Lrt  the  s-address  be 
extended  by  a  bits  to  [s]<,44„4.a.  Then  the  corresponding  correction  window 
becomes  (4v-f  2)-l-3s  faita  long.  The  extension  can  be  one-sided  or  two-sided. 
A  two-sided  extension  resnlts  in  a  correction  window  This 

extension  of  the  correction  window  allows  onr  approad  to  perform  better 
than  mieimnm  distance  feedback  decoding. 

During  the  error  correction  phase,  an  s-addreas  is  based  on 

a  block  of  the  received  bits.  It  is  then  need  to  access  the  correction  table 
to  retrieve  the  correction  information.  The  correction  is  tidied  to  the 
appropriate  pair  of  received  bits  and  the  s-addreas  is  recaknlated.  This  is 
the  feedback  step.  If  the  new  s-address  is  seto,  the  correetioa  is  assumed 
to  be  perfect.  A  new  received  bit  pair  is  shifted  into  the  correction  window 
and  the  process  is  rc'cated.  If  the  new  a-address  is  not  sero,  the  eorreetioa 
is  reversed  and  the  .  't  is  allowed  to  pass.  Feedback  decoders  suffer  from 
error  propagation  and  a.:rpropriatc  measners  need  to  be  taken  to  avoid  or 
control  it. 

ForiHustration,  conrider  tbe  performance  of  a  1/3-iate,  constraint  length 
w  =  7  convolutional  code  with  table  driven  decoder  shown  in  Figare  1.  A 
binary  symmetric  chaand  (BSC)  is  assumed  and  a  simple  correction  algo¬ 
rithms  is  used.  More  sophistiested  algorithms  that  promise  better  perfor¬ 
mance  are  being  studied.  This  corresponds  to  a  table  set  up  to  correct  sH 
errors  up  to  4  bits  and  aH  5,6  and  7-bit  errors  that  do  not  have  coaHieting 
addresses.  The  model  is  seen  to  be  better  than  tbe  "■"■I**"'"  results.  This 
is  because  the  modd  does  not  account  for  error  propagation. 
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Figure  1:  Thbls-Drivca  Error  Correction  Performance 
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Abstract  The  Pe-criterion  is  a  recent  analysis  [1]  of  sequen¬ 
tial  BSC  channel  decoding  based  on  the  design  condition  that 
decoders  may  fail  with  a  probability  Pe  >  0.  The  result  is  a 
definite  boundary  for  the  tree  search  region  and  a  well  defined 
estimate  of  path  numbers  sewched. 

This  work  extends  the  criterion  to  the  memoryless  binary 
input  soft  decision  channels  resulting  from  the  quantization  of 
the  AWGN  channel  output  at  the  receiver,  the  binary  input  Q- 
ary  output  {Q  >  2)  and  the  binary  input  continuous  output 
(semicontinuous)  channels  [2],  We  show  that  the  use  of  soft 
decisions  implies  a  reduced  search  region  compared  to  the  BSC 
case  2tnd  that  luge  savings  in  the  number  of  paths  searched  may 
be  achieved  when  soft  decision  information  is  available. 

Summary 

We  initially  derive  the  shape  of  the  search  region  in  a  gener¬ 
alized  distance  versus  depth  diagram.  The  generalized  distance 
is  a  function  of  the  metric  for  the  standard  sequential  decoding 
analysis  [3].  In  the  generalized  distance  versus  depth  diagram, 
the  search  region  is  bounded  by  the  drop  line,  the  set  of  points 
outside  of  which  the  correct  path  in  the  tree  wanders  with  prob¬ 
ability  Pe  or  less.  It  is  unique  for  each  channel  and  each  Pe, 
and  independent  of  the  code  rate,  R.  Comparisons  with  the  cor¬ 
responding  drop  lines  for  the  BSC  under  the  same  value  of  Pe 
show  the  decrease  in  the  search  region  resulting  from  the  use  of 
soft  decision  (  figure  1  )  . 

We  can  then  estimate  the  expected  number  of  paths  that  a 
non-backtracking  algorithm  views  within  its  search  region,  by 
^plying  an  analytical  method  based  on  difference  equations 
[1,2].  A  second  derivation  of  it  is  obtained  for  the  semicon¬ 
tinuous  channel,  using  an  integral  equation  [2]. 


Generalized  Distance,  D 


Depth,  n 


Figure  1:  Drop  Lines  for  the  Binary  Input  Binary  (BSC), 
Ternary  (BSEC),  4-ary  (BI40),  8-ary  (BI80)  and  Cmtinuous 
(AWGN)  Output  Channels. 


A  numerical  analysis  is  then  performed  for  the  BSC  and  smne 
equivalent  soft  decision  channels.  For  a  fixed  Ebt/No,  the  BSC 
and  the  semicontinuous  channel  are  imique  channels.  This  is  not 
so  for  the  Q-ary  output  channels  with  Q  >  2,  where  a  family 
of  channels  exists  for  a  given  EbtlNo-  Therefore,  an  additional 
optimization  is  necessary,  in  order  to  find  out  the  best  Q-ary 
output  channel,  in  terms  of  minimum  number  of  paths  searched. 
Table  1  shows  the  results  for  EbtINo  =  4.323dB  (  which  corre¬ 
sponds  to  the  crossover  probability  p  =  0.01  in  the  BSC  case  ) 
and  different  values  of  R  and  Pe. 

The  results  emphasize  once  again  the  importance  of  the  use 
of  soft  decision  in  the  decoding  process,  this  time  from  the  point 
of  view  of  the  number  of  paths  searched.  It  is  shown  that  luge 
savings  in  path  searching  can  be  achieved,  even  when  the  most 
simple  form  of  soft  decision  is  ^plied  (Q  =  3).  In  additimi, 
it  is  shown  that,  as  opposed  to  common  belief  [4],  soft  decision 
savings  are  not  necessarily  related  to  an  increase  in  the  channel 
capacity,  as  compared  to  the  hard  decision  channel  (BSC).  For 
many  cases,  the  best  soft  decision  channel  has  even  smaller  ca¬ 
pacity  than  the  corresponding  BSC,  even  though  it  represents 
substantial  savings  in  paths  searched.  Furthermore,  the  best 
Q-ary  output  quantization  depends  critically  on  R  and  E6t/AIo- 
The  latter  in  turn  shows  that  signal  level  and  noise  variance 
estimation,  or  equivalently  automatic  gain  control  (AGC),  is 
important  in  the  design  of  limited  search  decoders. 
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Table  1;  Number  of  paths  searched  for  the  Binary  Input  Bi¬ 
nary  (BSC),  Ternary  (BSEC),  4-ary  (BI40),  8-ary  (BI80)  and 
Con’inuou*  ( AWGN)  Output  Channr^  (  EhtfNo  »  4.323dB  ). 
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Abstract 

A  class  of  algorithms  performing  Maximum  Likelihood  Sequence 
Detection  under  various  structural  and  complexity  constraints  is  deri¬ 
ved  (BSC  or  AWGN).  Complexity  is  measured  by  the  number  of 
paths  used.  By  partitioning  the  S  states  into  C  classes  and  selecting  B 
paths  into  each  class,  the  signals  closest  to  the  received  one  shall  be 
selected  and  hence  Np=BC  paths  are  used.  This  class  of  algorithms 
has  the  name  SA(B,C)  (SA=Scarch  Algorithm)  and  the  Viterbi  Algo¬ 
rithm  (VA)  is  the  unconstrained  solution  denoted  SA(1,S). 

An  analysis  method  concerning  the  probability  of  the  first  error  event 
at  large  SNR  is  developed  for  the  whole  SA(B,C)  family  and  results 
in  an  analysis  tool  named  the  Vector  Euclidean  Distance  (VED)  of 
which  the  traditional  Euclidean  Distance  (ED)  is  a  scalar  special  case. 
The  smallest  number  of  paths  resulting  in  the  same  asymptotic  detec¬ 
tion  performance  as  the  VA  is  calculated  for  several  classes  of  trellis 
codes. 

Summary 

What  limits  the  use  of  the  VA  is  the  number  of  states,  S,  which  beco¬ 
mes  very  large  if  e.g.  joint  MLSD  is  applied  to  a  whole  system.  By 
instead  setting  initially  the  number  of  paths  to  be  traced  in  the  trellis 
and  the  requiring  that  MLSD  is  to  be  performed,  a  family  of  MLSD 
procedures  is  the  result  with  the  complexity  as  a  parameter.  Structural 
constraints  can  also  be  imposed  but  this  will  be  at  the  price  of  an 
increased  number  of  paths.  A  structure  can  be  given  by  partitioning 
the  S  states  into  C  classes  [2]  and  then  in  each  iteration  keeping  B 
paths  into  each  class.  Assuming  M-ary  transmission,  the  BC  paths 
will  be  extended  to  MBC ,  from  which  BC  are  selected  again,  in  each 
recursion.  This  selection  procedure  is  important  and  if  the  paths  with 
the  smallest  ED  (Hamming  distance)  are  selected,  MLSD  will  be  per¬ 
formed  for  the  AWGN  channel  (BSC). 

The  probability  of  a  first  error  event  for  the  AWGN  channel  is  ( 1  ] 

where  the  SNR,  is  large  and  is  the  (normalized  and  squ¬ 

ared)  ED  between  any  two  different  paths  in  the  trellis.  The  first  term 
is  the  traditional  asymptotic  error  event  probability  for  the  VA 
whereas  the  second  is  an  increment  due  to  the  non-exhaustive  search 
of  the  trellis  and  is  associated  with  the  probability  that  the  correct  path 
is  lost  after  the  selection  procedure.  The  quantity  is  determi¬ 
ning  the  as3rmptotic  probability  of  this  CPL  (correct  path  loss)  and  for 
reasons  given  below  called  the  VED. 

The  VED  d^  for  the  most  efficient  member  of  the  SA(B,C!)  family, 
SA(B,1),  is  associated  with  a  B  x  B  matrix  £  and  a  B  x  1  vector  fi 
whose  entries  are  given  by 

a.=  i,;=l,2,  ...,B 

‘='-2 . B 

where  df  is  the  ED  between  the  correct  path  and  contender  #/.  Also, 
djy  is  the  ED  between  contenders  #i  and  j.  When  Xhas  full  rank. 


and  the  minimization  is  performed  over  the  constellation  of  B  paths. 
Should  there  be  01  classes,  the  minimization  is  first  performed  for 
each  class  and  then  over  the  classes.  When  £  has  rank  /?<B,  a  RxR 
sub-matrix  is  taken  out  from  Xand  the  constraints  in  the  minimization 
are  also  changed  [1].  It  is  always  possible,  however,  to  find  a  set  of  A 
paths  called  active  (or  outer)  such  that 

where  the  dimensions  are  A  x  1  and  A  x  A  respectively  [1]. 

To  achieve  the  same  asymptotic  detection  performance  as  the  VA,  it  is 
necessary  to  have  By  considering  SA(B,1),  the  most 

efficient  member  of  the  SA(B,C)  family  (and  also  any  other  algo¬ 
rithm),  it  is  now  possible  to  find  the  least  B  for  which  this  inequality 
is  satisfied,  having  the  notation  B*.  For  the  class  of  trellis  codes  built 
up  from  convolutional  codes  and  antipodal  nnodulation  the  result  is 
B*  -  7s  which  is  asymptotic  in  the  sense  that  the  number  of  states  is 
very  large.  By  considering  codes  having  from  say  64  to  1024  states 
the  asymptotic  result  is  good  aslo  for  these.  Yet  another  class  of  trellis 
codes,  namely  r=2/3  convolutionally  encoded  8PSK  (often  referred  to 
as  Trellis  Coded  Modulation,  TCM)  the  same  result  still  applies.  For 
Continuous  Phase  Modulation  (CPM)  no  simple  rule  in  complexity 
reduction,  S/B*,  to  establish  asymptotic  detection  optimality  is  appli¬ 
cable  but  substantia]  savings  are  demonstrated,  see  example  below. 


The  developed  analysis  tools  are  general  so  that  coded  systems  and 
systems  where  the  channel  memory  has  been  included  in  the  system 
i^el  (trellis)  now  can  be  exactly  analyzed. 
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Abstract 

In  this  paper,  we  present  efficient  bidirectioncU  sequen¬ 
tial  decoding  (BSD)  technitptes.  With  BSD,  the  code  tree  is 
searched  from  the  root  and  end  states  of  the  encoded  tree  si¬ 
multaneously.  It  is  shown  by  analysis  as  well  as  computer 
simulations  that  with  BSD,  the  computational  variability  per 
decoded  block  can  be  substantially  decreased.  In  fact  it  is 
shown  that  the  Pareto  exponent  of  the  distribution  of  the  block 
computational  ^ort  with  BSD  is  twice  that  with  unidirectional 
sequential  decoding  (USD).  Good  codes  suitable  for  BSD  are 
found.  Also,  an  efficient  bidirectional  multiple  stack  algo¬ 
rithm  (BMSA)  is  proposed  and  analyzed.  This  BMSA  offers 
a  good  trade-off  between  computational  ^ort  and  error  per¬ 
formance. 

Summary 

Sequential  decoding  is  a  very  powerful  decoding  tech¬ 
nique  for  convolutional  codes.  The  main  drawback  of  se¬ 
quential  decoding  is  the  variability  of  its  decoding  effort  As 
a  consequence  of  this  variability,  the  decoding  effort  for  a 
given  data  block  may  exceed,  in  certain  situations,  the  phys¬ 
ical  limitations  of  the  decoder,  leading  uievitably  to  buffer 
overflows  and  information  erasures.  In  the  past  several  mod¬ 
ifications  have  been  developed  to  reduce  the  computational 
variability  of  sequential  decoding  [1-4].  In  this  paper,  new 
decoding  techniques  which  further  alleviate  this  drawback  of 
sequential  decoding  are  proposed  and  analyzed. 

In  a  system  using  convolutional  coding  and  sequential 
decoding,  information  is  usually  transmitted  in  blocks,  and 
each  block  is  terminated  by  a  tail  of  some  known  bits.  Start¬ 
ing  fnom  the  root  node  of  the  encoded  tree,  a  conventional 
sequential  decoder  moves  into  the  tree  in  the  forward  direc¬ 
tion,  one  branch  at  a  time,  along  the  most  likely  transmitted 
path.  Decoding  of  a  block  is  terminated  whenever  the  decoder 
reaches  the  end  node  of  the  tree.  Since  the  final  encoder  state 
is  known  by  the  decoder,  decoding  can  also  be  performed  in 
the  backward  direction. 

We  propose  in  this  paper  an  efficient  sequential  decoder 
that  explores  the  tree  simultaneously  in  both  forward  and 
backward  directions.  The  bidirectional  search  idea  has  been 
used  for  computing  the  free  distance  of  convolutional  codes 
[5, 6].  Recently,  Rouanne  and  Costello  have  applied  the  bidi¬ 
rectional  stack  algorithm  for  computing  the  distance  spectrum 
of  trellis  codes  [7].  This  idea  of  bidirectional  decoding  has 
also  been  applied  to  the  M-algorithm  [8],  and  to  the  decoding 
of  Mock  codes  [9].  The  BSD  algorithm  proposed  in  this  pa¬ 
per  is  based  on  the  well  known  stack  algorithm  [1],  and  hence 
it  is  called  bUBnctiotui  stack  algorithm  (BSA).  In  the  BS  A, 
two  separate  stacks  are  used.  One  is  used  for  the  forward 
search  of  the  tree  and  is  called  the  forward  stack  (FS).  The 
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Other  stack  is  used  for  the  backward  search  and  is  called  the 
backward  stack  (BS).  Starting  from  the  root  and  final  states 
of  the  encoder,  forward  and  backward  search  operations  are 
perfoimed  simultaneously  according  to  the  regular  stack  al¬ 
gorithm.  The  tree  search  is  terminated  whenever  the  best  path 
on  the  forward  or  the  backward  direction  merges  with  a  path 
on  the  opposite  direction. 

With  BSD,  it  is  (tesirable  to  use  codes  that  possess 
the  same  distance  profile  on  both  forward  and  backward 
directions.  Using  computer  search  techniques,  we  have  found 
good  non-systemalic  rate-1/2  codes  suitable  for  BSD. 

It  is  shown  by  analysis  and  computer  simulations  fluu 
with  BSD,  the  computational  variability  per  decoded  block 
can  be  substantially  decreased.  In  fact  it  is  shown  that  the 
Pareto  exponent  of  the  distribution  of  the  block  computational 
effort  with  BSD  is  twice  that  with  USD. 

The  idea  of  bidirectional  sequential  search  can  be  incor- 
ptnated  to  the  multiple  stack  algorithm  (MSA)  [4].  An  ef¬ 
ficient  bidirectional  multiple  stack  algorithm  (BMSA)  is  de¬ 
scribed  and  analyzed.  It  is  shown  that  this  BMSA  offers 
very  good  perfonnances  in  terms  of  both  error  probability 
and  computational  efforts. 
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It  is  well  known  that  the  behavior  of  sequential  decoding  is  lim¬ 
ited  by  its  computational  effort  [1],  Let  C  denote  the  number  of  tree 
nodes  examined  in  order  to  make  a  correct  decision.  Then  the  distri¬ 
bution  of  computation  [1]  is  the  distribution  of  C.  'Daditionally,  the 
performance  of  sequential  decoders  has  been  analyzed  using  random 
coding  arguments  which  obtain  results  in  the  form  of  averages  over  the 
ensemble  of  random  tree  codes  [1],  Based  on  this  analysis,  it  is  well 
known  that  the  distribution  of  C  is  essentially  Pareto  distributed,  a 
function  of  code  rate,  but  independent  of  the  code’s  constraint  length, 
K  [1].  However,  by  their  very  nature,  ensemble  averages  are  not  tied 
directly  to  tL<.  properties  of  a  particular  code  and  therefore,  techniques 
for  the  performance  analysis  of  specific  codes  are  highly  desirable. 

In  this  paper,  we  investigate  the  performance  of  the  Fano  and 
stack  decoders  [1]  using  exact  analysis  methods  based  on  importance 
sampling^.  In  contrast  to  the  classical  analysis,  the  simulation-based 
analysis  presented  in  this  paper  uses  no  random-coding  arguments  and 
is  applicable  to  specific  time-invariant  convolutional  codes.  Hence,  it 
serves  as  a  useful  complement  to  the  ensemble  average  analysis  as  one 
can  study  the  characteristics  of  sequential  decoders  for  any  given  code 
and  operating  condition. 

Fig.  1  shows  the  computational  effort  of  various  convolutional 
codes  operating  over  a  binary  symmetric  channel  (BSC)  [1]  and  em¬ 
ploying  the  stack  decoder.  The  simulation  results  show  an  interesting 
effect;  the  computational  effort  improves  as  K  decreases.  This  shows 
the  effect  of  the  “remerging  phenomenon”  [1]  on  the  computational 
effort  of  a  sequential  decoder.  In  brief,  for  a  code  with  small  K,  the 
incorrect  paths  traced  by  the  decoder  tend  to  merge  more  often  with 
the  correct  path,  thereby  resulting  in  an  undetectable  error.  Hence, 
for  a  given  SNR  ratio,  the  distribution  of  C  actually  depends*  on  K. 
However,  as  K  increases,  this  dependency  becomes  less  significant  and 
the  distribution  tends  to  converge  with  the  Pareto  tail.  Note  that  the 
classical  analysis  [1]  shows  no  such  dependence  and/or  characteristics. 


Fig.  1:  Sequential  Decoding  Computational  Effort 
Performance  Over  a  BSC  with  SNR  =  9  dB. 


For  the  hard-quantized  memoryless  channels,  the  distribution  of  C 
presents  another  interesting  feature;  the  curves  exhibit  a  stair-case  ap- 
pearance  before  assuming  the  usual  asymptotic  Pareto  behavior.  This 
effect  becomes  more  obvious  as  SNR  increases.  For  low-to-moderate 
values  of  SNR,  however,  the  curves  present  smooth  appearance  without 
any  abrupt  changes.  It  is  noted  that  this  phenomenon  is  not  appar¬ 
ent  in  case  of  the  unquantized  additive  white  Gaussian  noise  (AWGN) 
channel. 

It  is  well  known  that  optimal  distance  profile  (ODP)  and  opti¬ 
mal  /fee  distance  (OFD)  codes  make  excellent  choices  for  sequential 
smd  Viterbi  decoding,  respectively  [1].  To  compare  the  relative  perfor¬ 
mance  of  the  two  types  of  codes,  several  simulations  were  conducted. 
It  was  verified  that  the  ODP  codes  perform  better  than  OFD  ones  for 
all  SNR  conditions.  The  results  of  the  study  also  indicated  that  sys¬ 
tematic  ODP  codes  performed  much  better  than  the  non-systematic 
ones.  However,  this  improvement  in  the  distribution  of  C  performance 
was  again,  at  the  expense  of  increased  error  probability. 

Throughout  this  study,  we  used  the  Fano  metric  [1]  with  the  bias 
term,  B,  equal  to  the  code  rate,  R.  Several  simulations  were  conducted 
in  order  to  investigate  a  good  value  ofB.  Using  some  recently  developed 
techniques  by  the  authors  [2]  for  estimating  the  bit  error  rate  (BER), 
the  effect  of  B  on  BER  was  also  considered.  A  summary  of  the  results 
of  this  investigation  will  be  presented. 

The  performance  of  the  Fano  decoder  depends  on  the  value  of 
threshold  increment,  A  [1].  Our  simulations  indicate  that  when  A  is 
increased  initially,  the  computational  effort  (determined  by  considering 
the  forward  looks)  of  the  IWo  decoder  improves.  As  A  is  increased 
further,  the  computational  effort  severely  degrades.  In  contrast,  the 
distribution  of  the  number  of  (distinct)  nodes  searched  in  order  to 
make  a  specific  correct  decision  always  degrades  as  A  is  increased. 
This  dependence,  however,  becomes  less  significant  for  higher  SNR 
ratios. 

Finally,  the  relative  performance  of  hard-decision  decoding  vetsus 
soft-decisions  was  investigated.  An  interesting  result  was  obtained;  the 
computational  effort  over  the  unquantized  and  hard-quantized  AWGN 
channel  is  almost  identical  when  the  loss  associated  with  hard  decisions 
is  about  2.3  dB.  It  was  also  found  that  the  BER’s  for  the  two  operating 
conditions  were  identical.  Hence,  it  seems  that  if  a  sequential  decoder 
operates  over  an  unqnantized  AWGN  channel,  then  to  achieve  the 
same  performance  over  a  hard-quantized  AWGN  channel,  the  decoder 
roust  generate  an  additional  signal  power  of  the  order  of  2.3  dB. 
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'It  is  noted  that  in  the  eneemble  nTernge  analjrns,  the  path  merging  phenomenon 
ia  catefnily  avoided  by  the  nae  of  long  constraint  length  codes. 
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Abstract 

This  paper  describes  the  use  of  the  sequential  stack  algorithm 
to  decode  cyclic  (or  extended  cyclic)  block  codes.  Once  a 
block  code  is  endowed  with  a  trellis  structure  deatding  with 
any  of  the  convolutional  decoding  algorithms  is  viable.  Since 
trellises  for  block  codes  are  very  wide  a  sequential  algorithm, 
woilUng  at  moderate  signal-to-noise  ratios,  is  an  effective 
decoding  alternative  to  the  Viterbi  algorithm.  Using  Wolfs 
trellis,  Chang  and  Yao's  sequential  stack  algorithm,  and  the 
Fano  metric  the  (24,12)  Gouty  code  can  be  e^ently  decoded 
Computer  simulations  show  that  by  6  oB  the  sequential 
algorithm  is  the  most  efficient  (using  De'ery  and  Snyders' 
definition  of  complexity)  soft  decoding  algorithm  for  the 
(24,12)  Golay  code. 

Summary 

Owing  to  tbeir  algebraic  properties  linear  block  codes  are 
typicaUy  decoded  using  algebr^  lediniqucs.  Generally,  these 
algdnaic  techniques  make  hard  dedsioos  on  the  received  Uts 
causing  an  inherent  loss  of  2  dB  in  error  performance.  On 
the  otbtf  band,  convolutirmal  codes  are  decoded  using  the 
Viterbi  algorithm  or  a  sequential  algorithm  which  use  soft 
decisions  and  hence  have  a  2  dB  advantage  over  block  codes. 
Therefore,  the  ability  to  extend  the  convolutional  decoding 
techniques  to  block  codes  would  clearly  be  advantageous. 
By  applying  the  Viterbi  algorithm  to  a  trellis  [Wolf78] 
block  codes  can  be  decoded  using  st^  dedsioos.  Unftxtunately, 
the  width  of  this  trellis  grows  exponentially  with  the  numba 
of  parity  symbols,  thereby,  nu^g  the  Viterbi  algorit^ 
inefficient  A  solution  to  this  problem  is  the  use  of  a 
sequential  decoding  algotitlun. 

Given  a  trellis  a  convolutional  decoding  algorithm  such  as 
the  ^umtial  stack  algorithm  can  be  iqtplied.  A  trellis  for  a 
cyclic  or  extended  cyclic  code  can  be  constructed  by  using 
the  code’s  shift  register  esxxtder  [Wolf78].  The  advantage  of 
using  the  encoder  is  that  the  sequential  algorithm  can  generate 
trellis  slates  as  needed  rather  than  having  to  store  the  comi^ete 
trellis  beforehand.  This  investigation  uses  an  improved 
stadc  algorithm  which  stores  partial  paths  in  a  priority  queue 
[ChYa86].  The  priority  queue  is  highly  parallel  and  hence 
most  comparisons  are  done  simultaneously  making  he 
algorithm  all  the  more  effident.  The  Fano  metric,  us  d 
when  decoding  convolutional  codes,  is  used  as  a  measure  ui 
determine  the  best  path  through  the  trellis. 

Following  Snyders  and  Be’ery  [SnBe89I  complexity  is 
measured  in  terms  of  equivalent  r^  numbn  additions.  The 
sequential  algorithm  manipulates  four  pieces  of  information 
(viz.  state,  metric,  path,  ai^  depth).  Of  these,  the  metric  is 
the  only  real  number  and  thus  the  two  mettic  operations 
solely  comprise  the  complexity.  The  first  operation  is 
metric  addition  which  is  performed  when  the  brmch  meuic 


is  added  to  the  path  metric.  The  seamd  emuation  is  metric 
comparison  which  is  poformed  in  the  priority  queue  after 
every  deletion  ot  insertion. 

As  a  comparison,  the  complexity  of  sevmal  soft  decision 
techniques  used  to  decode  the  (24,12)  Golay  code  are  listed 
below.  Simulations  (AWGN  channel  with  BPSK  modulation) 
were  performed  to  measure  the  complexity  and  error 
peifoimaDce  of  the  stack  algoiidim.  Based  cntbeMsimnlatkiiis 
the  sequential  algorithm  is  the  better  algorithm  for  decoding 
the  Gt»ay  code  when  the  signal-to-noise  ratio  is  at  least: 


r 

Technique 

Maximum 

Complexity 

- N 

tnr 

Correlation  Decoder 
Viterbi  Algorithm 
Conway-Sloane  (86) 
Be'ery-Snyders  (86) 
Forney  (88) 
Snydert-Be'ery  (89) 
^Vaidy-Be’ery  (91) 

98303 

20473 

1614  • 

1551  * 

1351  • 

827  • 

651  •• 

2dB 

3dB 

5dB 

5dB 

5dB 

6dB 

6dB  J 

*  (SiiBe89]  ••  [VaBe91] 


As  signal-to-noise  ratios  increase  the  sequential  algorithm 
quickly  becomes  the  most  efficient  algoritlun  for  doDoding 
block  codes.  For  high  signal-to-noise  ratios  the  algorithm 
ai^xoacfaes  its  minimum  decoding  comi^exity  of  292  additkn 
equivalent  operations.  The  simulations  also  confirm  that 
the  sequential  algorithm  performs  maninuim  likelihood  soft 
dedsioa  decoding. 
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Abstract 

Recently,  a  tree-structured  polytopal  vector  quantization  scheme 
referred  to  here  as  the  Principal  Component  Vector  Quantization  al¬ 
gorithm  (PCVQA),  has  been  developed  and  has  been  shown  to  have 
the  design  complexity  only  linearly  proportional  to  the  codebook  size. 
This  paper  proposes  an  efficient  technique  to  implement  PCVRA  so 
that  it  can  be  made  viable  for  the  real-time  environment. 

I.  An  Overview 

Vector  quantization  (VQ)  has  been  considered  as  a  viable  technique 
for  still  image  data  compression  [1-2].  One  of  its  advantages  is  that 
although  its  encoding  is  a  complex  operation,  its  decoding  is  a  sim¬ 
ple  table  look-up  [3].  However,  many  VQ  techniques,  especially  those 
clustering-based  ones,  are  seriously  hampered  by  complexity,  particu¬ 
larly  design  complexity.  It  is  well  known  that  the  design  complexity  of 
unconstrained  VQ  algorithms  are  exponentially  proportional  to  both 
the  codebook  size  and  the  dimension  of  input  vectors.  An  effective 
approach  to  reducing  the  complexity  is  to  consider  constrained  VQ 
techniques.  A  recently  developed  VQ  technique,  referred  to  here  as 
the  principal  component  vector  quantization  algorithm  (PCVQA),  is 
one  such  approach. 

It  was  shown  that  the  design  complexity  of  PCVQA  is  only  linearly 
proportional  to  the  codebook  size  [5,6].  The  fundamental  concept 
of  PCVQA  rests  on  the  use  of  principal  component  as  the  normal 
direction  of  the  partitioning  hyperplanes  in  designing  a  tree-structured 
polytopal  VQ.  Thus,  the  design  complexity  of  PCVQA  is  critically 
related  to  the  complexity  of  estimating  the  principal  components. 

The  objective  of  this  paper  is  to  develop  techniques  for  further 
reducing  the  implementationai  complexity  of  PCVQA.  Of  particular 
interests  here  are  still  image  compression  for  real-time  codebook  re¬ 
transmission  in  applications  including  HDTV  broadcasting.  The  is¬ 
sues  addressed  here  are  those  of  complexity,  quality  of  coded  images 
measured  by  peak  signal-to-noise  ratio,  and  trsmsmission  bit  rates. 

The  proposed  approach  is  to  implement  PCVQA  in  combination 
with  the  method  of  self-organizing  codebook  [7]  and  JPEG  (Joint  Pho¬ 
tographic  Experts  Group)  [8]  for  still  image  compression.  Simulation 
results  show  that  this  approach  can  achieve  better  performance  than 
JPEG  does  alone.  The  price  for  achieving  such  good  performance  is 
only  a  slight  increase  in  the  design  complexity.  The  amount  of  such 
complexity  increase  is  governed  by  the  complexity  of  calculating  the 
principal  components. 

Four  numerical  methods  are  examined  here  for  calculating  the  prin¬ 
cipal  components,  namely,  the  gradient  descent  method,  the  power 
method,  the  eigenvsJue  shift  acceleration,  and  the  modified  Aitken’s 
6*  acceleration  [9]. 

This  paper  also  considers  a  local  search  encoding  scheme  to  fur¬ 
ther  improve  PCVQA ’s  performance  especially  when  the  input  vectors 
tend  to  be  clustered  as  in  the  case  of  transform  VQ.  It  known  that  in 
coding  high  definition  images,  it  is  desirable  that  the  input  vectors 
to  the  vector  quantizer  be  high  dimensional  to  reduce  the  transmis¬ 
sion  bit  rate.  Thus  transform  techniques  such  as  DCT  (discrete  cosine 
transform)  are  needed  to  reduce  the  input  vector  dimension. 

II.  The  Proposed  Scheme 

The  proposed  image  coding  scheme  is  summarized  here.  It  first 
divides  an  image  into  8x8  pixels  of  subimage  blocks.  The  DC  coef¬ 
ficient  of  DCT  is  subtracted  from  each  block  and  is  coded  separately. 
A  codebook  is  then  designed  by  the  PCVQA  for  the  subtracted  blocks 
which  contain  only  AC  signals.  The  self-organizing  encoding  method 

[7]  is  empioyed  to  interleave  transmission  (or  storing)  of  the  codewords 
and  labels.  In  this  way,  the  codewords  will  be  coded  in  a  more  compact 
way  by  JPEG.  The  basic  idea  for  the  seif-organizing  encoding  is  that 
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if  a  codeword  is  selected  the  first  time,  it  is  transmitted  (or  stored). 
Otherwise,  only  its  label  is  transmitted  (or  stored).  Therefore,  the 
codebook  is  self-organized  in  the  sense  that,  the  first  selected  code¬ 
word  is  shifted  to  the  first  position  in  the  codebook  buffer  and  all 
the  codewords  above  this  codeword  are  moved  down.  The  decoder 
will  reconstruct  a  codebook  in  the  same  order.  Note  that  only  the 
codewords  that  are  transmitted  (or  stored)  will  be  encoded  by  JPEG. 
Thus,  without  considering  the  codebook  design  complexity,  this  ap¬ 
proach  involves  even  less  encoding  complexity  than  JPEG  does. 

We  then  examine  various  numerical  techniques,  namely,  the  gradi¬ 
ent  descent  method,  the  power  method,  the  eigenvalue  shift  accelera¬ 
tion,  and  the  modified  Aitken’s  acceleration,  for  efficient  estimation 
of  the  principal  components  which  are  essential  to  the  implementation 
of  PCVQA.  We  show  that  the  gradient  descent  method  which  uses 
the  Rayleigh  quotient  as  an  estimate  for  the  largest  eigenvalue  has  the 
same  performance  as  the  power  method.  The  power  method  with  the 
eigenvalue  shifted  by  one- third  of  the  predicted  largest  eigej. value  has 
more  rapid  convergence.  And  the  eigenvalue  shift  by  the  Aitken’s  «- 
acceleration  has  the  most  rapid  convergence.  Consequently,  we  em¬ 
ploy  Altken  s  6^  acceleration  to  implement  the  proposed  compression 
scheme.  This  amounts  to  a  complexity  of  only  three  to  five  times  more 
than  that  of  JPEG  alone,  according  to  our  simulation  experience.  This 
implies  that  the  proposed  image  coding  scheme  can,  in  general,  code 
a  color  image  of  256  x  256  pixels  in  less  than  one  minute. 

III.  Concluding  Remarks 

We  introduce  here  an  image  coding  technique  which  combines 
PCVQA,  the  self-organizing  codebook  and  JPEG.  Simulation  results 
demonstrate  that  this  technique  can  achieve  better  performance  for 
still  image  compression  than  JPEG  alone.  Numerical  methods  for  efifi- 
ciently  implementing  PCVQA  are  also  studied.  The  proposed  scheme, 
thanks  to  much  reduced  complexity,  can  be  employed  to  construct 
codebooks  that  are  to  be  retransmitted  regularly  in  HDTV  broadcast¬ 
ing. 
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For  each  positive  integer  /i.let  be  the  set  of  all  tam  matrices 

of  zeroes  and  ones  containing  at  least  one  "I".  We  suppose  that  a 
black-white  image  which  is  not  all  white  is  represented  as  an 

element  M  e  Tts  for  a  large  enough  N.  For  lossless  encoding  of 
Af,  we  contrast  two  possible  image  compression  methods.  One 
method,  called  template  coding,  is  a  multiresolution  technique 
which  finds  a  string  of  templates  (shapes)  from  a  Hnite  dictionary 
that  can  be  used  to  successively  reconstruct  M  starting  from  the 
matrix  [1]  in  Pfy ;  the  string  of  templates  is  then  encoded  template- 
by-template.  The  other  method  is  the  classical  image  compression 
technique  known  as  subpicture  coding,  in  which  M  is  partitioned 
into  square  sub-blocks  of  a  given  size  which  are  then  encoded  sub¬ 
block  by  sub-block. 

We  first  discuss  template  coding  of  M.  An  integer  parameter  k 
is  fixed,  2ik<N.  U  A  isa  square  zero-one  matrix  of  order  ^  k, 
we  define  C(A)  (the  ‘‘core"  of  A)  to  be  the  largest  square 
submatrix  of  A  lying  in  the  upper  left  comer  of  A  whose  order  is 
divisible  by  k,  and  we  define  P(A)  (the  "projection"  of  A)  to  be 
the  matrix  we  obtain  from  C(A)  by  replacing  each  of  the 
submatrices  in  the  partitioning  of  C(A)  intok^  submatrices  with 
a  one  or  zero  depending  upon  whether  the  submatrix  does  or  does 
not  contain  a  "1”.  Template  coding  of  Af  is  performed  in  four 
steps:  (1)  form  the  matrices  (AF :  f  =  1,2,  ...r )  where  AF  =  C(Af), 
A#2=  C(F(A/1)), ....  AF  =  C(F(AF-1)),  P(AF)  =  [1] :  (2)  take  the 
template  dictionary  D  to  be  the  union  of  the  set  (0,1 )  and  the  set 

of  matrices  in  FI*  that  appear  in  the  partitions  of  the  AF  into  kxk 
submatrices;  (3)  form  the  string  of  templates  from  D  that  are  seen 
as  one  horizontally  scans  each  of  the  following  in  the  order 

described:  the  elements  of  the  partition  of  Af'  into  kxk 
submatrices,  the  elements  of  F(AF-1)  not  in  AF,  the  elements  of 
the  partition  of  AF~*  into  Jbd;  submatrices,  the  elements  of 
F(AF~2)  not  in  AF”*,...,  the  elements  of  the  partition  of  AF  into 
hdc  submatrices,  the  elements  of  Af  not  in  AF;  (4)  encode  the 
template  string  template-by-template. 

In  subpicture  coding  of  Af,  we  also  fix  an  integer  parameter  k, 
1  £  k<N.  We  form  string  consisting  of  the  elements  of  the 

partition  of  C(A0  into  kxk  submatrices  (scarmed  horizontally) 
followed  by  the  elements  of  Af  not  in  C(Af )  (scanned 
horizontally).  We  then  encode  this  string  entty-by-entry. 

We  state  our  results,  which  indicate  that  template  coding 
is  preferable  to  subpicture  coding  in  a  certain  asymptotic  sense.  If 
Afe  ftn  and  n>k,  let  (Bt(Af,s])  be  the  minimum  total 

number  of  bits  that  are  achievable  in  template  coding  (subpicture 
coding)  of  Af  with  integer  parameter  k.  Then  statements  (i)  (ii) 
hold: 

(i)  Given  any  k  and  c>  0,  there  exists  k*  =  **(*,£)  and  N*  = 
A(*(t,e)  such  that  Bt«[Af.r]S(l+e)Bt{Af4l  for  any  n^N*  and 
any  Af  e  ftn . 


(ii)  Let  5  be  any  nonempty  subset  of  the  unit  square  whose  box¬ 
counting  dimension  is  less  than  two,  and  for  any  n  let  Afu  e  Ff„ 
be  the  matrix  in  which  an  element  equals  zero  if  and  only  if  the 
corresponding  sub-square  in  the  partition  of  the  unit  square  into 
n~'xn~'  sub-squares  contains  no  point  of  S  ;  then,  for  any  k,  the 
ratio  B2lM„,t]/ BiIMhjS]  converges  to  zero  as  n-too. 

Statement  (i)  tells  us  that  template  coding  always  yields 
compression  performance  at  least  as  good  as  subpicture  coding. 
Statement  (ii)  gives  us  a  wide  class  of  black-wUte  images  for 
which  template  coding  outperforms  subpicture  coding. 

We  illustrate  with  an  example.  Template  coding  with 
k=2  applied  to  the  matrix 


0  0  0  0  1  1  1  r 
0  0  0  0  1  0  1  0 
0  0  0  0  0  1  0  0 
0  0  0  0  1  1  0  0 
0  0  1  1  0  0  0  I 
0  0  1  0  0  0  1  1 
11110  10  1 
10  10  1111 


yields  Af'  =  Af,  Af ’  = 


0  0  1  r 
0  0  10 
0  10  1 
1111 


,  and 


Letting  Tj,  7*2  be  the  templates 


respectively,  we 


obtain  the  string  of  templates  t  =  (Ti.  Ti,  Ty,  Ty,  Ti.  Ti.  Ty.  T^,  Ty, 
Ti-  Tz,  7*1.  ft)-  (The  first  entry  of  t  allows  one  to  determine  AF, 
entries  2-4  allow  one  to  reconstruct  AF  from  AF  and  entries  S-13 
allow  the  recovery  of  Af  from  AF. )  We  have  BJiMj]  =  13  bits. 
On  the  other  hand,  subpicture  coding  with  A=2  applied  to  M 
gives  us  the  string  ^  Q,  7%,  T2,  Q,  Q,  Ty.  Q,  Q,  T2,  Q,  Ty,  T2,  T2,  Ty, 
Ty)  where  Q  is  the  2x2  zero  matrix,  from  which  one  concludes 
that  B2lAf,s]  =  2S  bits. 


*  Authors  supported  by  NSF  Grant  NCR-9003106 
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Abstract 


A  straight  iine  y  =  mx  -t-  b  has  as  its  digital  lepiesentation  on  an 
integer  grid  the  set  of  points  {{xj,  yj);  Xj  =  i,  yj  =Lmxj  +  bi  i  =  o, 

1. ...  n).  It  is  shown  that  for  a  uniform  distribution  on  the  set  of  lines, 
the  error  in  estimating  the  line  from  its  digital  represenution  is  0(log 
n/ta). 

Summary 

For  a  line  y  =  mx  +  b  the  corresponding  digital  straight  line  is 
defined  as  those  points  on  or  immediately  below  the  line  y  =  mx  -t-  b, 
i.e.  the  set  of  points 

{(xj,  yj):  Xj  =  i.  yj  =  Lraxj  +  bj.  i  =  o.  1, ...  n)  (1) 

Without  loss  of  generality  assume  O^m^l.O^b^l.  The  question 
addressed  here  is  if  it  is  known  that  the  edge  is  a  straight  line  how  well 
can  we  estimate  the  line  from  its  digital  representatioa  Fbr  simplicity 
we  define  the  error  at  x  as  e(x)  =  Imx  +  b  -  m'x  -b'l,  where  m'x  b'  is 
the  estimated  edge,  and  the  error  e  as 

e  =  max^  (e(x)},o2x^n  (2) 

We  assume  a  uniform  measure  on  p  and  0,  the  length  ana  angle 
respectively  of  the  normal  to  a  straight  line,  producing  a  non-uniform 
measure  on  the  set  of  digital  straight  littes.  Generally  this  gives  greater 
measure  to  digital  straight  lines  with  iarger  estimation  error,  affecting 
the  order  of  the  error.  The  expected  value  of  the  error  e  vrill  be  shown 
to  be  upper  arxl  lower  bounded  by  OflognAi). 

The  order  of  the  error  defined  in  (2)  is  the  same  as  the  order  of 
d,nin,  the  minimum  distance  to  the  line  of  points  (x;,  yj),  i  =  o,  1,  ...n,  in 
(iV  We  first  give  a  proof  of  the  resuit  that  if  an  arbitrary  small  set  of 
lines  can  be  neglect^  then  the  error  in  estimating  a  iine  from  its 
digitized  representation  has  error  Oflfri).  More  preciseiy  we  prove  the 
following. 

Theorem  1.  For  any  function  f(n),  increasing  arbitrarily  slowly 
with  n,  P{dn,i„  <  f(nVh)  >  l-8/f(n). 

Proof;  For  a  line  y  =  mx  +  b,  let  dj  =  mxj  +  b  -  yj,  i  =  o,  1,  ...n, 
where  yj  is  defined  in  (1).  Assume  for  the  moment  b  =  o.  Fbrm  =  p/q, 
o<p<q^n,p  and  q  relatively  prime,  then  dj  e(k/q,  k  =  o,  1,  ...q-1 ). 
Equivalently,  distances  to  the  line  for  points  on  the  array  immediately 
above  the  line  are  in  the  set  (kq,  k  =  1 , 2,  ...q).  Thus  b  cannot  be 
increased  by  more  than  1/q,  while  keeping  the  digital  straight  the  same. 
Similarly,  for  any  b,  its  value  can  range  only  in  an  interval  of  width  1/q, 
fbr  a  fixed  digital  straight  line.  Thus, 

dmin”®*!^  l^i)  O/q  (3) 

Consider  a  line  with  o  <  m  <  1.  From  number  theory,  m  can  be 
approximated  by  p/q,  o<p<q^n,pandq  relatively  prime,  such  that 

Im  -  p/ql  <  1/qn  (4) 

Fbr  any  m  its  conesponding  q  will  be  the  maximum  satisfying  (4).  For 
y«mx-fbandy>  (p/q)  x  b,  the  maximum  distance  between  them  is 
max;j  {Iy-y1}  <  1/q,  since  o  :£  x  £  n  - 1.  Thus  for  any  line  d,„i„  S  2/q. 

The  set  of  slopes  with  approximation  p/q  in  (4)  is  a  subset  of  the 
interval  (p/q  -1/qn,  pq  -t-  1/qn).  It’s  measure  (assuming  a  unifotm 
distribution  on  6)  is  bounded  by  4/qn.  Let  niq  denote  the  set  of  slopes 
widi  fixed  q  in  the  approximatim  in  (4).  Its  measure  is 


where  9(q)  is  the  number  of  values  p  can  take  on  (and  be  relatively 
prime  to  q). 

The  measure  u  of  all  m  having  i  2n/f(n)  is 

u<[2n/f(n)](4/h]  =  8/f(n)  (6) 

For  q  £  2n/f(n)  we  have  that  dmjn  <  f(n)Ai.  from  which  the  theorem 
follows. 

The  expected  error  is  of  the  same  order  as  the  expectation  of 
drain-  Hence  we  show  the  following. 

Theorem  2. 0(log  n/h)  S  EldmiJ  ^  O  Qog  nAi) 

Proof.  From  the  proof  of  Theorem  I  we  have  that  for  any  line 
drain  ^  2/q,  with  q  defined  in  (4).  Thus  from  (5) 

n  n 

E  [dminl  ^  2:  (2/q)  (4Ai)  g  8/h  £  1/q  =  O  (log  nM)  (7) 

qpl  q=l 

To  lower  bound  E[drain]C(»t3der  m  e  m^  =  {p/q,  o  <  p  <  q  ^  n) 
with  p  and  q  relatively  prime.  The  intercept  b  can  range  over  an 
interval  of  width  1/q  without  changing  the  digital  straight  line.  The 
measure  of  all  lines  bounded  from  above  by  mx  -f  a  -f  1/q  and  below  by 
mx  +  a  is 

UqSl/2q2n  (8) 

which  lower  bounds  the  measure  of  all  lines  with  diat  digital 
representation. 

Fbraliney  =  mx-t-b,memQandbEbQS  {k/q,k=o,  1,  ...q-1) 
the  ordered  sequence  (dj  =  mx|  b  •  yj,  i=o,  I.  ..Ji)  is  unique  to  the 
triplet  (p,  q,  b).  Taking  the  set  of  digital  straigiu  lines  determined  by 
th^  triplets  we  get 

E  [drain]  ^  E  E  E  E  (drain '  Uq  2  Z  E  L  (l/qXl/2q2n) 
qpb  qpb 

E  [dmin]  ^  E  q>  (qV2q2n  =  (l/2n)  Z  i|>(q)/q2  (9) 

q  q=l 

where  <p(q)  is  the  number  of  integers  p  <  q  and  relatively  ptrime  to  q. 
From  (9)  it  can  be  shown  EJdmin]  2  0(k>gn^). 
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Abstract 

A  new  perceptually  relevant  entropy-constrained  coding 
scheme  based  on  the  just-noticable-distortion  ( JND)  level  of  the 
human  observer  is  described  and  its  properties  demonstrated. 
The  JND  at  each  pixel  location  is  defined  as  the  threshold  of 
detectability  of  the  human  visual  system  (HVS)  to  errors  in  re¬ 
producing  that  pixel.  Because  of  the  masking  effect  of  the  HVS, 
errors  below  the  JND  are  rendered  imperceptible.  The  JND 
is  determined  empirically  as  a  fimction  of  spatial  frequency, 
local  texture  smd  local  contrast.  A  distortion  measure  is  de¬ 
veloped,  making  essential  use  of  the  JND,  for  a  subband  cod¬ 
ing  enviromnent  which  attempts  to  mimic  the  subjective  eval¬ 
uation  effects  of  the  HVS.  This  distortion  measure  employs  a 
weighted  squared-oror  metric,  where  the  weighting  depends 
upon  the  JND  value  at  each  pixel  position.  It  essentially  as¬ 
signs  near-zero  distortion  to  subthreshold  errors  and  ^prood- 
mately  squared-error  distortitm  to  superthreshold  errors.  This 
perceptual  distortion  measure  was  incorporated  into  a  previ¬ 
ously  developed  design  procedure  for  entropy-constrained  sub^ 
band  coding  (ECSBC)  schemes  based  upon  training  data.  We 
demonstrate  that,  compared  to  use  of  the  conventional  squared- 
error  distortion,  significant  improvements  in  subjective  image 
reconstruction  quality  can  be  achieved  at  low  average  bit  rates 
using  this  perceptual  distortion  measiue. 

SmnmarY 

Image  compression  is  a  very  important  area  of  research  to¬ 
day,  especially  for  tise  in  bandwidth  intensive  applications  such 
as  high-definition  television  (HDTV)  and  multimedia  systems. 
The  aim  of  image  compression,  or  coding,  is  to  minimize  the 
average  distortion,  as  indicated  by  a  specified  fidelity  or  distor¬ 
tion  measure,  for  a  fixed  transmission  rate.  This  can  be  ac- 
compUshed  by  exploiting  any  redundancy  present  in  the  image, 
together  with  use  of  an  appropriate  quantization  strategy. 

While  there  has  been  extensive  research  directed  toward 
characterizing  the  rate-distortion  performance  of  various  image 
compression  schemes,  almost  all  of  these  studies  have  been  based 
on  use  of  the  mean-squared  error  fidelity  criterion.  For  exam¬ 
ple,  previously  reported  results  for  entropy-constrained  subband 
coding  (ECSBC)  have  shown  good  quality  image  reconstruc¬ 
tions,  as  well  as  excellent  rate-distortion  performance,  using  a 
minimum  mean-squared  error  distortion  criterion  [1].  Minimum 
mean-square  error  (MMSE),  however,  is  not  the  best  measure 
of  hiunan  psychophysical  evsJiuition  because  it  does  not  take 
into  consideration  the  relative  visibility  of  coding  artifacts.  It 
is  becoming  increasingly  necessary  to  define  distortion  measures 
based  on  subjective  evaluation  criteria  [2],  which  will  allow  min¬ 
imization  of  the  perceived  distortion,  rather  than  mean-square 
error,  for  a  desired  transmission  rate.  Determination  of  a  per¬ 
ceptually  baaed  distortion  metric  has  therefore  been  a  subject  of 
renewed  interest  in  the  image  coding  literattire  with  early  work 
described  in  [3]. 

In  this  paper  we  focus  on  the  development  and  use  of  a 
perceptually  relevant  distortion  measure  for  use  in  a  subband 
coding  environment  which  better  mimics  the  subjective  eval¬ 


uation  properties  of  the  HVS  than  the  squared-error  metric. 
This  distortion  measure  is  then  incorporated  in  a  straightfor¬ 
ward  manner  into  previously  developed  design  procedures  for 
ECSBC  schemes  based  upon  training  data,  as  described  in  [1] 
for  the  specific  case  of  mean-square  distortion. 

The  perceptual  distortion  measure  is  based  upon  the  con¬ 
cept  of  a  just-noticable-distortion  (JND)  level  at  a  given  pixel 
location  in  the  reconstructed  fullband  image  when  mon  occur 
in  different  subbands.  This  results  in  a  spatially  varying  p>ercep- 
tual  threshold  T,'(x)  indicating  the  JND  due  to  errors  at  pixel 
site  X  in  the  i’th  subband.  The  evaluation  of  Ti(x)  is  deter¬ 
mined  empirically  similar  to  the  procedure  described  in  [4]  and 
depends  upon  spatial  firequency  (subband),  local  texture  and  lo¬ 
cal  contrast.  However,  unlike  the  coding  ^proach  in  [4],  where 
the  perceptual  threshold  was  used  to  simply  set  the  stepsize  of 
a  tmiform  threshold  scalar  quantizer,  in  this  work  the  percep¬ 
tual  threshold  is  used  to  describe  a  distortion  measure  which  is 
then  used  in  the  design  of  ECSBC  schemes.  By  making  use  of 
the  sq>propriately  ikdiq>ted  ECSBC  design  procedure  reported 
in  [1],  a  variety  of  scalar  and  vector  quantizaticm  schemes  can 
be  investigated  for  encoding  the  subband  components.  This 
includes  entropy-constrained  vector  quantization  (ECVQ)  [5] 
as  well  as  entropy-constrained  predictive  vector  quantization 
(ECPVQ)  schemes  (6).  Optimum  bit  allocation  is  provided  as 
an  integral  part  of  this  design  approach. 

A  number  of  resiilts  are  presented  illustrating  the  superior 
subjective  performance  associated  with  the  use  of  this  percep¬ 
tual  distortion  measure  compared  to  the  conventional  squared- 
error  distortion  criterion.  Suggestions  for  further  extension  of 
this  approach  are  provided. 
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Introduction 

Subband  coding  (SBC)  is  an  attractive  image  coding 
scheme.  For  compression  the  subband  signals  must  be 
quantised.  Vector  quantisation  (VQ)  of  the  subband  sig¬ 
nals  exploits  the  statistical  bindings  between  the  samples. 
The  performance  of  a  particular  vector  quantiser  is  highly 
dependent  on  how  well  its  codebook  matches  to  the 
source  statistics.  Investigations  concerning  VQ  perfor¬ 
mance  require  multivariate  source  models. 

In  this  respect  the  SIRP  models,  which  will  be  recalled  in 
the  following  section,  have  many  interesting  properties. 
SIRP  model  sources  can  be  efficiently  quantised  using  lat¬ 
tice  VQ  which  reduces  implementation  complexity  drasti¬ 
cally  compared  to  VQ  with  unstructured  codebooks.  The 
known  designs  employ  contour-gain  separated  VQ  (sim¬ 
ilar  to  [1])  and  a  lattice  structured  codebook  for  quanti¬ 
sation  of  the  contour  vector. 

Besides,  traditional  VQ  can  benefit  from  SIRP  models  by 
training  with  pseudo  random  data  generated  according  to 
the  model  [2]. 


SIRP  models 

A  random  process  is  called  a  spherically  invariant  ran¬ 
dom  process  (SIRP),  iff  every  joint  probability  density 
function  (pdf)  in  n  variables  p„(x)  is  a  function  of  the 
quadratic  form  x'^M~^x  only: 

Pn(x)  =  /„(x’’M-‘x),  (1) 

where  M  denotes  the  corresponding  n  x  n  covariance  ma¬ 
trix.  The  function  /„  describes  the  shape  of  the  distri¬ 
bution. 

This  means  that  all  contours  of  equal  probability  density 
are  multidimensional  ellipsoids.  In  particular,  all  contour 
lines  of  equal  density  of  any  bivariate  distribution  taken 
from  two  samples  of  a  SIRP  are  ellipses. 

Du  [3]  gave  a  comprehensive  SIRP  model  for  image  sig¬ 
nals  which  uses  generalized  gaussian  functions  to  describe 
the  univariate  marginal  distributions.  He  showed  further 
that  image  blocks  (after  subtraction  of  the  sample  mean) 
can  be  modeled  as  realizations  of  a  SIRP.  It  will  be  shown 
here  that  image  blocks  in  the  subband  domain  can  be 
modeled  as  SIRPs  as  well. 


Subband  image  statistics 

Numerous  still  pictures  (512  x  512  pel,  8  bit  quanti¬ 
zation)  were  filtered  with  separable  quadrature  mirror 
filters  (QMF).  The  resulting  images  were  divided  into 
blocks  of  size  4x4.  In  the  baseband  the  sample  meein 
of  the  blocks  was  subtracted.  These  blocks  were  then 
transformed  by  a  principal  axes  transformation  H  into 
uncorrelated  vectors  with  unit  variances: 

y  =  Hx  with  =  (2) 

where  M  denotes  an  estimate  of  the  covariance  matrix. 
If  the  hypothesis  of  spherical  invariance  (1)  is  true,  the 
vectors  in  the  principal  domain  obey  the  pdf; 

Pn(y)  =  (3) 

Thus  the  pdf  of  y  depends  only  on  the  radius  r  ss  P? 
of  the  vectors.  This  hypothesis  has  been  tested  with  a 
x’-test  of  goodness  of  fit.  The  results  have  shown  that 
the  spherical  symmetry  is  much  stronger  in  the  subbands 
than  in  the  original  domain.  It  turned  even  out  that 
spherically  syiiunetric  distributions  are  better  suited  (by 
an  order  of  magnitude)  than  those  distributions  related 
to  the  common  model  of  statistical  independence  in  the 
principal  axes  domain. 

This  gives  rise  to  use  lattice  VQ  in  the  subband  domain, 
thus  combining  the  advantages  of  subband  coding  with 
the  performance  of  VQ,  without  the  need  for  the  storage 
of  many  different  codebooks. 
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We  discuss  coding  of  2D  data  using  a  recursive  framework 
for  noncausal  Gauss  Markov  random  fields  (GMRF)  defined 
on  ^nite  lattices.  This  framework  exploits  to  advantage  the 
structure  of  GMRFs  providing  the  means  to  achieve  recur¬ 
sive  optimal  processing,  while  preserving  the  noncausaliiy 
of  the  field. 

The  compression  scheme  uses  noncausal  prediction  cou¬ 
pled  to  vector  quantization  (VQ).  The  noncausal  prediction 
fits  first  a  noncausal  GMRP  to  the  data,  then  whitens  the 
data  by  an  inverse  filtering  type  operation,  and  finally  vector 
quantizes  the  prediction  error  field.  In  this  paper,  we  ex¬ 
plain  the  details  of  the  noncausal  prediction.  Lack  of  space 
prevents  us  to  discuss  the  parameter  estimation  algorithm 
that  is  needed  to  fit  a  2D  model  to  the  data,  see  [1]. 

GMRF  Recursive  Structure 

Important  in  the  coding  of  GMRFs  is  the  issue  of  pa¬ 
rameterization.  This  leads  to  the  question  of  when  is  a 
positive  definite  matrix  the  covariance  of  a  GMRF?  Partial 
answers  are  available  only  in  very  special  cases.  In  general, 
for  GMRFs  on  finite  lattices,  it  is  not  possible  to  answer  the 
question  directly.  It  turns  out  that  the  right  way  to  pose  it 
is  in  terms  of  the  inverse  of  the  covariance  matrix  which  we 
refer  to  as  the  potential  matrix,  see  [2]  for  details. 

Let  {xjj},  1  <  i,j  <  N,  represent  the  2D  field  on  a  finite 
lattice  (taken  as  a  square,  for  simplicity.)  Woods  [3]’s  min¬ 
imum  mean  square  error  representation  of  a  homogeneous 
first  order  GMRF  (nearest  neighbors)  is 

Xi,i  =  Ph{xij-i  -I-  Xj^+i)  -I-  -H  Xi+ij)  -H  Cij,  (1) 

where  Ph  nnd  0^  are  the  strengths  of  the  neighbor  hori¬ 
zontal  and  vertical  field  interactions,  respectively.  We  call 
these  the  field  potentials.  Collecting  all  equations,  tak¬ 
ing  care  of  boundary  conditions  (b.c.)  (which  here  we  as¬ 
sume  Dirichlet  zero  boundary  conditions,  see  [2]  for  general 
b.c.,)  we  get 

i4x  =  e  (2) 

where  the  potentials  are  collected  in  the  matrix  A  —  I  ® 
B  +  H^C,  and  ®  is  the  Kronecker  product.  The  vector 
X  =  vecfx,],  where  the  N  vectors  x  collect  the  intensities 
of  the  pixels  of  the  ith  -  row.  I  is  the  Af*  identity  matrix, 
B  =  In  -  PhHn  and  C  —  —PvIn,  H  w  an  matrix  of 
zero  entries,  except  the  upper  and  lower  diagonal  (all  ones,) 
and  In  and  Hn  are  like  I  and  H  but  of  dimension  N. 

The  noise  e  has  correlation  E*  =  <t^A.  Apart  the  nor¬ 
malizing  factor  of  (T*,  the  covariance  Ex  of  x  is  then  the 
potential  matrix  A. 

•Work  partially  lui^otted  by  ONR  grant  #  N00014-91-J-100I 


By  Cholesky  factorization,  A  =  U'^U.  Equation2)  gives 

f/x  =  w  (3) 

where  the  covariance  of  w  is  <t*/.  The  Cholesky  factor  U  is 
not  a  full  matrix.  It  is  block  diagonal  with  band  N  +  1.  The 
diagonal  and  the  upper  diagonal  blocks  of  U  are  obtained 
from  the  iterates  of  a  Riccati  type  equation.  In  [2],  the  con¬ 
vergence  behavior  of  this  iterative  scheme  is  studied.  For 
practical  purposes,  one  may  stop  it  after  less  than  10  iter¬ 
ations,  considerably  reducing  the  associated  computational 
effort. 

2D  Coding 

To  code  2D  data,  we  need  the  field  parameter  values 
Ph,Pv,<r^-  In  [1] ,  we  analyze  the  parameter  space  of  GMRFs 
and  study  their  maximum  likelihood  (ML)  estimation. 

We  have  used  this  to  code  two  dimensional  data.  The 
basic  structure  of  the  (lossy)  codec  is  the  following:  (i)  The 
global  mean  is  subtracted  from  the  2D  data,  which  is  then 
input  to  an  ML  -  estimator;  (ii)  a  Cholesky  factorisation 
of  A  leads  to  the  unilateral  representation  of  the  field; 
(iii)  the  field  is  whitened  leading  to  the  error  field;  (iv)  the 
error  field  is  vector  quantized;  (v)  lossless  entropy  type  cod¬ 
ing  can  be  used  to  achieve  further  compression.  When  ap¬ 
plied  to  image  data,  we  have  verified  that  we  can  get  over 
a  factor  of  3  -  10  of  more  compression  ratio  than  DCT 
based  techniques.  This  procedure  and  modifications  to  it 
are  presently  under  study. 

References 

[1]  Nikhil  Balram  and  Jos^  M.  P.  Monra.  Noncausal  Ganaa 
Maikov  random  fields:  Parameter  atrnctare  and  estimation. 
Technical  report,  LASIP,  Department  of  Electrical  and  Com¬ 
puter  Engineering,  Carnegie  Mellon  University,  April  1991. 
Accepted  for  publication  after  minor  revisions,  45  pages,  re¬ 
vised  February  1992. 

[2]  ioek  M.  F.  Monra  and  Nikhil  Balram.  Recursive  structure 
of  noncausal  Gauss  Maikov  random  fields.  IEEE  TVnnsnc- 
tions  on  Information  Theory,  IT-38(2):334-354,  March 
1992. 

[3]  J.  W.  Woods.  Two-dimensional  discrete  Markovian  fields. 
IEEE  Tran*.  Inform.  Theory,  IT-18:232-240,  1972. 


281 


An  Optimally  Bit  Allocated  Wavelet  Pyramid  Image  Coding  System 


Jie  Chen  and  Shuichi  Itoh 

University  of  Electro-Communications,  Chofu,  Tokyo  182,  Japan 


Abstract 

Reconstruction  error  properties  for  a  wavelet  pyramid  image  cod¬ 
ing  system  are  described.  It  is  shown  that  when  optimal  bit 
allocation  scheme  is  adopted,  the  reconstruction  noises  and  the 
quantization  noises  of  the  wavelet  pyramid  coding  system  become 
regular,  and  the  reconstruction  noises  can  be  approximated  to 
stationary  white  noises.  Based  on  the  error  property  analysis, 
an  optimal  bit  allocation  scheme  with  respect  to  the  minimum 
reconstruction-mean-square-error  (RMSE)  criterion  is  given.  The 
system  reconstruction  distortion  at  a  given  bit  rate  Tl  is  proved  to 
be  directly  proportional  to  2“^*.  Experimental  results  are  given. 
Summary 

For  a  J  stage  discrete  orthonormal  wavelet  pyramid  image 
coding  system  [2],  let  {Pj,(f3j)i<j<j,(f)j)i<j<j,(I>y)i<j<j}  be 
the  wavelet  decompositions  of  an  input  image  Pq  and  let 

be  their  quantization  MSB’s, 
respectively.  Furthermore,  let  tj  denote  the  reconstruction  MSE 
at  j-th  layer  of  the  pyramid.  Based  on  the  orthonormality  of  the 
discrete  orthonormal  wavelets  and  signal  processing  theory,  the 
reconstruction  MSE  at  a  layer  is  given  as 

+  (1) 

iaj  kail 

Theorem  1  fn  the  wavelet  pyramid  image  coding  system,  if  the 
quantizers  which  minimize  the  system  reconstruction  MSE  at  a 
gtven  b$t  rate  are  employed,  then  the  quantization  MSB’s  and  the 
reconstruction  MSB's  at  every  layer  of  the  wavelet  pyramid  satisfy 
the  following  equations' 


=  e 

(2) 

VJ 

II 

(3) 

>1  =  4e,  =  4''--*+‘ej, 

(4) 

where  l<t<3,  l<j<J.  Furthermore,  if  the  quantization 
noises  are  cross-uncorrelated  and  white,  then  the  reconstruction 
noise  at  each  layer  is  also  white. 

Theorem  1  indicates  a  kind  of  regularity  about  the  quantization 
noises  and  the  reconstruction  noises.  The  regularity  is  reflected 
at  least  in  three  aspects:  1)  The  quantization  MSB’s  at  the  same 
layer  of  the  wavelet  pyramid  are  equal  to  each  other,  and  also 
equal  to  the  reconstruction  MSE  at  the  same  layer.  2)  The  quanti¬ 
zation  MSB’s  or  the  reconstruction  MSB’s  at  two  successive  layers 
are  related  by  a  factor  4  in  quantity.  3)  If  the  quantization  noises 
are  cross-uncorrelated  and  white,  then  the  reconstruction  noises 
at  all  layers  will  be  white.  This  regularity  should  be  useful  for  the 
practical  applications  such  as  post-processing  of  the  reconstructed 
image,  the  progressive  transmission  and  so  on.  Theorem  1  has 
also  simplified  the  estimation  of  the  reconstruction  MSE’s  or  the 
quantization  MSE’s,  since  only  one  quantization  MSE  is  needed 
to  know. 

For  1  <  *  <  3,  1  <  y  <  if  we  assign  R*  Wts  to  each 
component  of  sub-images  Dj  and  Rj  to  Pj,  then  the  optimal  bit 


allocation  problem  can  be  formulated  as 

j  s 

minimize  £o  =  +  E  E  (5) 

Mail 

subject  to  R=  R^,  (6) 

where  A'*  (or  Aj)  denote  quantization  factors.  The  optimal  so¬ 
lutions  are  obtained  by  using  Lagrange  multipliers: 

=  R  +  +  (7) 

Rj  =  R-i-J-l{i-~)  +  \log,^,  (8) 

where  K  is  given  by 

=  (A’7)^  n  (9) 

j=l fc=l 

Furthermore,  the  minimum  MSE  of  the  system  reconstruction  is 
given  as 

£(R)  =  min£o  =  25<‘-^>A'2'’^  (10) 

which  is  directly  proportional  to  2"’*. 

A  wavelet  pyramid  image  coding  system  composed  of  10-tap 
W-QMF’s  and  an  optimally  bit  allocated  uniform  quantizer  was 
implemented  and  the  experimental  results  are  shown  in  Fig.  1. 

References 

[1]  I.  Daubechies,  “Orthononral  Bases  of  Compactly  Supported 
Wavelets”  Communications  on  Pure  and  Applied  Mathematics, 
Vol.41,  No.7,  pp.  909-996,  1988. 

[2]  S.  Mallat,  “A  Theory  for  Multiresolution  Signal  Decompo¬ 
sition:  The  Wavelet  Representation,”  IEEE  Trans,  on  PAMI, 
Vol.ll,  No.7,  pp.  674-693,  July  1989, 

[3]  P,  H.  Westerink,  Boekee  D,  E.,  Biemond  JJ.  and  J.  W.  Woods, 
“Subband  Coding  of  Image  using  Vector  Quantization,”  IEEE 
TVans.  on  Commun.,  Vol.36,  No. 6,  pp.  713-719,  1988. 


SNR  (dB) 


bpp 

Fig.  1  SNR  versus  bit-rates  for  256  by  256  pixel  image  “Lena”: 
(a)  SBC-fSQ;  (b)  adaptive  DCT;  (c)  SBC-(-spatial  differential 
VQ;  (d)  SBC-Fadaptive  DPCM;  (e)  Wavelet+SQ-FHuffman;  (f) 
entropy  of  Wavelet-|-SQ,  where  dashed  lines  are  taken  from  [3j. 
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SUMMARY 

Rotationally  invariant  (RI)  trellis  codes  are  im¬ 
portant  whenever  the  modulation  signal  set  has  a  rota¬ 
tional  symmetry  and  the  transmission  system  can  intro¬ 
duce  a  phase  rotation.  Rotational  invariance  means  that 
a  trellis  code  is  closed  under  rotation  of  the  individual 
elements  of  the  signal  set  onto  which  it  is  mapped.  In 
this  paper  we  look  at  this  "rotation"  as  an  "isometry 
sequence"  under  which  the  code  is  invariant.  We  con¬ 
centrate  on  trellis  codes  that  can  be  described  as  the  orbit 
of  a  group  of  isometry  sequences  acting  on  the  cosets  of 
a  lattice  partition  in  Euclidean  space. 

An  isometry,  T,  of  a  trellis  code,  C,  is  a  map 
such  that  if  ci  and  C2  are  codewords  in  C,  ||ci  - 
C2II  =  ||T(ci)  -  T(c2)||,  6  C.  The  distance  func¬ 

tion  is  the  Euclidean  distance  on  the  individual  com¬ 
ponents  of  c^ch  codeword,  given  by  the  one-to-one 
map  between  codeword  labels  and  the  modulation  sig¬ 
nal  set.  A  coordinate  isometry  is  an  isometry  such  that 
V  i,  lllT(ci)  -  T(c2)1/||  =  ||(ci  -  C2l,||.  Note  that  the  shift 
operator  is  an  isometry  that  is  not  a  coordinate  isometry. 
A  symbolic  dynamic  group,  S,  is  a  subshift  [1 J  over  a  finite 
group,  that  itself  forms  a  group  in  sequence  space  by  ap¬ 
plying  the  group  operation  coordinate-wise.  The  group 
S  is  guaranteed  to  be  a  subshift  of  finite  type,  conjugate 
to  a  full  shift,  which  in  part  implies  that  it  is  generated 
by  a  deterministic,  labeled  directed  graph  that  admits  a 
sliding  window  inverse  (2]. 

To  describe  a  geometrically  uniform  trellis  code 
C  |41,  begin  with  the  set  of  isometries,  U,  that  map  cosets 
of  a  particular  constellation  partition  onto  themself.  This 
set  forms  a  finite  (non-abelian)  group  under  composition. 
Describe  a  symbolic  dynamic  group,  S,  over  (J;  S  is  gen¬ 
erated  by  a  graph  that  is  referred  to  as  an  isometry  graph 
of  the  code.  The  trellis  code,  C,  is  then  the  orbit  of  an  ini¬ 
tial  sequence  co  (a  sequence  of  points  in  Euclidean  space), 
under  the  action  of  S.  The  code  C  can  then  be  viewed  as  a 
generalization  of  a  Slepian  group  code  (3, 4|.  Note  that  if 
Co  is  a  constant  sequence,  the  encoder  can  be  obtained  by 
taking  the  action  of  each  edge  label  of  the  graph  of  S  on 
the  "point"  co.  However,  the  resulting  graph  will  in  gen¬ 
eral  not  be  minimal,  (i.e.,  the  graph  that  generates  C  may 
be  smaller  than  the  graph  that  generates  S).  Rotational 
invariance  now  corresponds  to  the  orbit  of  a  symbolic 
group  S  that  includes  the  "all  rotations"  sequence. 

We  demonstrate  these  ideas  by  concentrating  on 
maps  of  the  form  Ac  +  b,  where  c  €  C,  A  :  C  — >  C  is  an 
invertible  matrix  and  i  e  Z2  x  Z4.  These  maps,  which 
operate  on  the  labels  of  the  QAM  constellation,  form  a 
group  of  32  elements,  U.  Each  map  in  U  induces  an 
isometry  on  the  QAM  signal  set  (the  standard  8-way  par¬ 


tition  of  under  an  isometric  labeling  [4, 5]).  The  code 
C  is  the  orbit  of  an  isometry  graph  over  U.  This  im¬ 
plies  that  the  graph  of  the  encoder  is  embedded  in  the 
isometry  graph,  since  all  codewords  can  be  generated  as 
the  action  of  the  isometries  on  the  "all  zero's"  sequence. 
In  other  words,  if  the  isometry  graph  is  only  labeled 
with  the  b's  (i.e.,  let  A  be  the  identity  map),  the  resulting 
graph  must  be  reducible  to  the  graph  of  the  encoder  for 
C.  For  example,  consider  the  following  class  of  rotation- 
ally  invariant  trellis  codes  [5]:  let  the  generator  G  be  the 
rate  1  /2  convolutional  code  over  Z4  (the  integers  mod¬ 
ulo  4)  given  by  (1  -  D,  Gp(D)l.  The  input  sequence  is 
also  over  Z4,  but  the  output  is  taken  as  (m,p)  e  Z2X  Z4, 
where  m  is  the  most  significant  bit  generated  by  the  1  -  D 
term,  and  p  is  the  output  from  Gp(D).  The  outputs  of  the 
code  are  then  mapped  to  the  8  cosets.  Two  specific  ex¬ 
amples  will  be  presented:  a  4-state  code  with  generator 
[1  -  D,  2  -  Dl,  that  has  a  free  Euclidean  distance  of  3 
and  an  8-state  isometry  graph,  and  an  8-state  code  with 
generator  [1  -  D,  2  +  D  +  2iy],  that  has  a  free  Euclidean 
distance  of  5  and  also  an  8-state  isometry  graph.  The 
later  code  is  equivalent  to  the  8-state  Rl  trellis  code  used 
in  the  CCITT  V.32  standard  [6J,  while  the  first  demon¬ 
strates  an  example  where  the  isometry  graph  is  larger 
than  the  graph  of  the  encoder. 
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On  90''  Rotationally  Invariant  Lattice  Codes 

J.  A.  Sheppard  and  A.  G.  Burr^ 


Introduction 

A  key  step  In  the  design  of  a  lattice  code  Is  the  design  of  an  n- 
dlmensional  constellation  —  having  selected  a  suitable  lattice, 
the  designer  must  choose  a  finite  subset  of  points,  perhaps  with 
seme  translation  or  roUtion.  to  define  a  set  of  code  words  with 
the  required  properties. 

It  la  well  known  that  the  SNR  performance  of  and  n-dlmenafonal 
code  is  parity  determined  1^  the  shape  of  the  region  of  n-spaoe 
occupied  the  code.  The  maximum  shape  gain  la  given  by  an 
n-sphere,  but  this  is  difficult  to  Implement  and  leads  to  a 
constituent  2-dlmensional  constellation  with  a  high  peak  to 
average  power  ratio  |5|.  Various  other  techrilques  have  been 
suggested,  including  Voronol  Constellationa  (1,  21  and  Shell 
Constructions  |3|,  but  none  of  theae  works  adebeas  the  problem 
of  rotational  Invariance. 

In  a  practical  qrstem,  the  receiver  must  recover  the  correct  phase 
of  the  constellation.  If  phase  eymmetrles  exist  then  the  receiver 
may  lock  to  the  wrong  phase.  Thus  the  code  must  either  have  no 
phase  aymmetries,  or  be  Inunune  to  any  rotations  resulting  from 
such  symmetries.  The  latter  type,  known  as  RoUtionalty 
Invariant  codes,  are  attractive  since  phase  symmetries  can  make 
phase  recovery  easier  and  more  reliable. 

This  paper  outlines  techniques  for  designing  rotationally 
Invariant  codes  using  Leech  and  Sloane's  constructions  A  and  B 
|4|.  Theae  lattices  are  either  optimally  dense  or  offer  a  good 
performance/complexity  trade-off  in  up  to  32  dimensions,  and 
thdr  simplicity  relative  to  other  lattice  construcUons  makes  them 
worthy  of  considerations  in  higher  dbnensions. 

Construction  A  Lattices 

Construction  A  forms  a  lattice  from  the  union  of  cosets  of  the 
lattice  of  even  Integera.  tai  which  the  coset  leaders  are  the  words 
of  a  linear  binary  code.  If  this  is  offset  by  the  vector  ^  j,  ....  ^ 
then  the  points  lie  on  the  half  Integer  grid.  Z"  +  ...,$.  Thus 

a  two  dlmetuional  constituent  constellation  consists  of  four 
subsets  of  points:  22"  *  2Z”  *  *  (0.  I).  2Z^ 

(1,  0),  and  iz**  -f  (p  ^  (1.  D.This  is  Illustrated  in  figure  1. 


Example  of  2-dimeruional  constellation 


Pour  subsets  of  points. 

Figure  1  —  Deoamposttion  of  2-D  Constellation. 


In  the  encoder,  some  of  the  data  bits  generate  a  code  word  from 
the  constituent  linear  binary  code,  which  is  used  to  ddlrte  the 
sequence  of  subsets  to  be  trarrsmltted.  The  remaining  bits  are 
mapped  dlrectty  onto  points  within  the  required  subsets. 

Rotational  invariance  Is  achieved  by  regarding  the  subsets  as 
rotations  of  2Z"  -f  ^  rather  than  translations.  Data  bits 
mapped  onto  point  numbers  will  clearty  be  unaffected  Iqr  90° 
rotations,  and  only  the  bits  used  to  determine  the  subset 
sequence  need  special  attention. 

It  can  be  shown  that  if  the  constituent  binary  code  contains  the 
words  II,  1....)  and  (0,  1. 0,  1....)  then  a  90°  rotation  of  all  the 
qnubols  wiU  result  In  another  valid  code  word.  It  is  also  possible 
to  list  the  2^  words  of  the  binary  code  so  that  a  90°  rotation 
corresponds  to  a  ryclic  shift  of  2^'^  places  down  the  UaL  Thus 
the  data  bits  can  be  used  to  speclty  the  position  in  the  Ust  of  new 
binary  code  amrd  relative  to  the  previous  word,  and  the  lattice 
code  becomes  rotatioruilty  invariant. 

Construction  B  Lattices 

In  a  Construction  B  lattice  code,  the  two  dlmensloiuil 
conateUation  Is  divided  Into  sixteen  subset  and  two  binary  codes 
are  used  to  define  the  sequence  of  subsets.  Rotational 
invariance  can  be  achieved  with  two  sets  of  word- level  differential 
encoding,  using  similar  techniques  to  those  outlined  above. 
However,  space  does  not  allow  a  detailed  explartation  here. 

Conclusions 

Techniques  for  designing  90°  rotatlonalty  invariant  construction 
A  arxl  B  lattice  codes  have  been  outlined  above.  These  have 
potential  applications  both  In  the  design  of  lattice  codecs,  and  In 
multidimensfonal  trellis  codes. 
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1  Introduction 

In  order  to  account  for  carrier  phase  instabilities  especially  on 
satellite  or  mobile  links,  several  proposals  have  beeu  made  to  de¬ 
fine  rotationally  invariant  coded  modulation.  They  were  based  on 
multidimensional  or  nonlinear  convolutional  codes,  on  separate  en¬ 
coding  of  the  I-  and  Q-coordinates,  or  on  multilevel  block  codes, 
especially  with  Reed-Muller  codes  as  component  codes. 

This  contribution  describes  a  semi-algebraic  approach  with  mul¬ 
tilevel  convolutional  codes  that  leads  to  schemes  with  consider¬ 
ably  low  complexity.  The  construction  guarantees  OO'-invariance 
of  the  code,  not  yet  of  the  information  symbols  itself.  Hereto,  a 
special  differential  en/decoder  structure  has  been  developed. 

2  Conditions  for  the  binary  convolutional  com¬ 
ponent  codes 

Assuming  a  binary  set  partitioning  of  the  2"*-QAM,  with  a  la¬ 
belling  that  is  chosen  to  be  QO^-invariant  from  the  third  partition 
label  on,  one  obtains  the  following  conditions: 

I  The  all-ones  sequence  must  be  a  valid  code  sequence  of  code  (1). 

(...,i,i,...,i,...)e 

II  All  valid  code  sequences  of  code  (1 )  must  be  valid  code  sequences 
of  code  (2),  too.  A***  C  A*** 

III  No  conditions  for  A*-’*,  j  =  3,  . . . 

3  Differential  en-  and  decoding 

The  modulo-4  differential  decoder  is  located  after  the  multistage 
convolutional  decoder.  Otherwise  the  noise  power  would  be  dou¬ 
bled  at  the  input  of  the  differential  decoder,  significantly  reducing 
the  achievable  coding  gain. 

The  modulo-4  differential  encoder  is  located  between  the  encoding 
stages  one  and  two  (see  Fig.).  It  can  be  shown  that  this  demands 
for  a  systematic  second-level  code. 

4  The  semi-algebraic  construction 

As  outlined  in  section  2,  the  all-ones  sequence  has  to  be  a  valid 
code  sequence  of  code  (1).  For  =  1  a  code  with  all  gene.a- 
tors  having  an  odd  weight  obviously  fulfills  this  condition.’  For 
>  1,  the  all-ones  code  sequence  can  be  obtained,  if  there  is 
the  possibility  of  creating  odd  weighted  generators  by  combining 
some  rows  (by  means  of  the  information  sequence)  of  the  Forney 
matrix  of  code  (1).  This,  e.g.,  is  fulfilled,  if  one  row  consists  only 
of  odd  weighted  generator  polynomials  or  if  the  whole  code  is  only 
composed  of  odd  weighted  generators. 

To  ensure  rotational  invariance  for  code  (2),  as  a  necessary  and  suf¬ 
ficient  condition,  one  has  to  ensure  that  every  valid  code  sequence 
A*’*  of  code  (1)  is  also  belonging  to  the  set  of  code  sequences  A*** 

‘tO):  number  of  info  biU  per  frame,  coderate  Rft)  =  il£ 


of  code  (2). 

V;<.)3;<,)  :  A(*>  =  /<*)  •  =  A^’J . 

Info  series,  Forney  generator  matrix) 

As  this  equation  has  to  be  fulfiUed  for  arbitrary  /I’l,  spears 
as  a  function  of  /I’l.  A  possible  approach  for  the  construction  of 
code  (2)  is  to  define  the  components  of  >  ^»)) 

as  shifted  versions  of  (assuming  =  1): 

4*>  =  /(») .  D"  (ib<»  =  1,  A  =  1, . . . ,  ja  e  {0, . . . ,  -  1}) . 

£<•’1  is  the  constraint  length  of  code  (j)  (not  multiplied  with 
Z)  is  a  time  delay  factor  (z~’  of  the  Z-transform). 

There  has  to  be  at  least  one  =  /I’l,  i.e.  ja  =  0,  in  order  to 
express  the  low-order  term  =  1,  appearing  in  by  means 
of  Furthermore,  one  ja  has  to  equal  ja  =  This  is 

necessary  as  the  term  appearing  in  G*’!  has  to  be  expressed 

by  G**l  with  the  maximum  exponent  L***  —  1.  For  reasons  of 
decoding  complexity  it  is  useful  to  have  £>***  <  L*’*,  because  it*** 
is  usually  greater  than  /t*’*.  This  can  be  achieved  by  the  proposed 
construction  leading  to  a  considerably  low  decoding  complexity. 

Some  results  are  given  subsequently.  A  coding  scheme  with  an 
as)rmptotic  coding  gain  of  6  dB,  e.g.,  has  a  complexity  of  4  states 
for  the  first  stage  and  8  states  for  the  second  (and,  maybe,  ad¬ 
ditionally  the  Wagner  decoding  of  a  parity-check  code  as  a  third 
stage). 


Code(l) 

Code  (2) 

Gen.  non-rec. 

(4,7,7) 

(10,13,15),  (16,13,15) 

i  3,6 

1  2.3 

Gain  /  dB 

4.7 

Gen.  non-rec. 

(15,15,13) 

(51,61,73) 

^49 

i3,5 

Gain  /  dB 

6.5 

Gen.  non-rec. 

(1,2,7,7) 

(46,52,61,73) 

I,  3,8 

2  2  4 

Gain  /  dB 

6 

Differential 
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I.  Introduction 

The  idea  behind  rotationally  invariant  codes  is  to  find  an  encoder 
that  ensures  the  following:  given  any  coded  sequence,  if  we  ro¬ 
tate  each  symbol  through  a  fixed  rotational  symmetry,  then  the 
new  sequence  of  rotated  symbols  is  also  a  valid  code  sequence.  If 
this  is  the  case,  we  may  use  differential  encoding  and  decoding  to 
overcome  the  effects  of  phase  rotation. 

In  this  work,  we  describe  an  approach  to  the  design  of  rotation- 
ally  invariant  codes  using  multilevel  coding.  This  technique  allows 
the  designer  to  achieve,  a  priori,  a  given  performance  level,  as  well 
as  being  invariant  to  rotations  through  constellation  symmetries. 

II.  Rotational  Invariance  and  Multilevel  Codes 

It  has  been  argued  (see  e.g.,  [1])  that  a  “natural  labeling”  is  best 
for  achieving  rotational  invariance.  In  Figure  1 ,  we  illustrate  nat¬ 
ural  labeling  on  a  16-QAM  constellation.  Below  each  point  is  a 
binary  label.  Note  that  for  the  16-QAM  constellation,  the  two 
least  significant  bits  are  not  rotationally  invariant,  while  the  two 
most  significant  bits  are  already  invariant  to  rotations  through 
multiples  of  jr/2  radians.  This  leads  us  to  the  following  observa¬ 
tion: 

Observation  1:  Only  the  two  least  significant  bits  (for  QAM) 
need  to  be  encoded  in  such  a  way  as  to  make  them  rotationally 
invariant 

The  theory  behind  multilevel  codes  involves  partitioning  the 
signal  space  into  subsets.  The  multilevel  code  employs  an  Z,-level 
code,  C  =  [Cl,  •  •  • , Cl),  where  the  C;  are  component  codes.  Each 
component  code  is  responsible  for  selection  of  its  corresponding 
subset.  Oi.  i.e.,  C  is  the  set  of  all  sequences  {(af,  •  ■  -.Ut)}  of  sub¬ 
sets  in  the  constellation  that  satisfy  (a*)  €  Cjforalli  =  1,2,  ...,L 
[2] 

We  note  that  in  Figure  1,  that  the  two  LSB’s  are  affected  by  a 
rotation.  This  again  yields  an  observation: 

Observation  2:  Both  code  Ct  and  Cj  must  provide  rotational 
invariance  for  a  16-QAM  constellation. 

It  can  be  shown  that  for  multilevel  codes,  the  condition  of  ro¬ 
tational  invariance  is  given  by  the  following. 

90  Degree  Rotational  Invariance:  90  degree  rotational  invari¬ 
ance  is  guaranteed  if  the  codes  Ci  and  C2  (assumed  linear)  meet 
the  following  criteria: 

1.  Code  C|  must  contain  the  all  one’s  sequence. 


1S1 

1^ 

(^01 

1880 

• 

^11 

• 

• 

0110 

0010 

1111 

ifti 

(Soo 

0801 

• 

• 

• 

1^1 

1010 

0111 

1110 

Fig.  1.  16  QAM  constellation. 


These  conditions  are  the  same  as  those  found  in  [3],  but  couched 
in  different  terms.  In  [3],  this  method  is  extended  to  M-PSK  sig¬ 
naling.  However,  the  conditions  for  rotational  invariance  become 
harder  to  meet  for  M  greater  than  4.  The  importance  of  these 
observations  is  that  now  we  have  a  constructive  method  of  finding 
rotationally  invariant  codes.  It  is  a  simple  matter  to  find  codes 
that  achieve  coding  gains  of  2  to  5  dB  with  little  complexity. 
For  example,  binary  BCH  codes  of  the  same  length  meet  these 
conditions.  The  design  of  convolutional  codes  that  meet  these 
conditions  can  be  difficult.  However,  through  the  use  of  gener¬ 
ator  matrix  descriptions  of  the  codes,  we  have  been  able  to  find 
a  systematic  method  for  designing  them.  One  approach  is  to  ex¬ 
tend  the  convolutional  code  over  two  time  epochs  and  then  prune 
paths.  A  second  approach  is  to  delete  generator  polynomials  from 
a  high  rate  code  to  form  the  subcode.  And  a  third  approach  is 
to  form  new  generator  polynomials  from  a  high  rate  code  by  mul¬ 
tiplying  them  with  what  we  call  sequence  limiting  polynomials. 
All  three  approaches  involve  some  trial  and  error  in  finding  the 
codes  with  the  best  distance,  and  ensuring  the  all  one’s  sequence 
remains  in  the  subcode. 
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2.  Code  Cl  must  be  a  subcode  of  code  Cj. 
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JV-dimensional  (N-D)  signals  are  generated  by  selecting  JV  an  orthogonal 

basis  Vii(0.V^(l) . V’w(0-  The  data  vector  x  =  (ii,ij,...,*w)  is  carried 

over  the  channel  by  the  signal  s(l)  =  *jV'j(0-  Here  we  assume  that  the 

symbols  ij  take  on  values  ±1.  At  the  receiver,  the  data  vector  is  recovered  by 
exploiting  the  orthogonality  of  the  basis  functions:  a  bank  of  JV  filters,  each 
matched  to  one  V’i(<),  gives  x  at  its  output. 

In  this  paper  we  d^ribe  a  technique  to  generate  an  g-D  basis  for  trans¬ 
mission  over  bandlimited  channels.  The  idea  here  is  the  following.  Assume 
we  have  a  set  of  M  orthogonal  signals  ♦  =  {dr(0i  •  •  ■  idN(0}'  We  generate  a 
2iV-dimensional  basis  by  taking  the  products  {i(l)#  and  {j(l)#  ,  with  {i(t) 
and  {j(0  chosen  properly. 

As  an  application  of  this  procedure,  we  get  Q’PSK  [1,  2]  by  choosing  #  = 
{p(0.«(0)  fi(*)  =  sinui.l,  {j(l)  =  cosuiet.  Similarly,  a  four-dimensional 

set  of  signals  defined  over  the  interval  (-2T»,6Tj),  Tt  the  bit  duration,  can 
be  generated  by  choosing 

where  lI(l/4Tj)  =  1  if  |t|  <  2T»,  and  =  0  otherwise.  Inspection  of  (1)  shows 
that  ^i(t)  and  0j(l)  are  orthonormal.  An  eight-dimensional  signal  basis  can 
now  be  obtained  by  taking  the  products  {a,j(<)coSiVj/,a,y(l)sinuiet},  where 

aij(0  =  A)«i(l)4,(0.  <,J  =  1,2, 

and  the  constants  Ah  may  take  on  any  non-zero  value.  Here  Aij  are  chosen  so 
as  to  have  1|  ai,(l)  ||=1|  a,,(l)  ||=  0.5817  and  1|  0,2(1)  1|=1|  02,(1)  |1=  0.8134. 
The  cumulative  power  spectral  density  is  shown  in  Fig.  1.  The  spectrum  of 
8D-4P2C  is  more  compact  than  that  of  QPSK  and  of  Q’PSK. 


BTt 


Figure  1:  Cumulative  power  spectral  density  of  QPSK,  Q’PSK,  and  8D-4P2C: 
Percentage  of  signal  power  in  the  bandwidth  ST,. 

We  can  transmit  the  8-D  vector  x  =  (*,,...,  *a)  through  the  signal 

e(l)  =  *c(l)  cosu/cl -I- s,(l)  sinui^l  (3) 


where 

»e(0  =  *,011(1)  +  *2012(1)  +  *5021(1)  +  *6022(1) 

and 

»•(!)  =  *30,1(1)  -(•  *,0,2(1)  +  *7021(1)  +  *S022(1)' 

This  modulation  scheme  can  be  interpreted  from  a  different  point  of  view 
by  defining  the  two  waveforms  Si(l)  =  0|i(l)  0,2(1)  and  ^(l)  =  -Oii(l)  + 

012(1),  where  — 2T,  <  1  <  2T,.  Moreover,  observe  that  we  have  022(1)  = 
-0,1(1  -  4T,)  and  02,(1)  =  0,2(1  -  4T,).  Consequently,  we  can  write  for 
example: 


Oii(l)  +  022(1)  +  021(1)  -  022(1)  =  0i(t)  +  /fi(l  -  4T,), 


and  similar  relations  hold  for  all  the  possible  values  of  the  4-tuple  (x, ,  *3,*].*,). 
Thus,  we  can  represent  our  modulation  scheme  by  writing  the  transmitted  sig¬ 
nal  in  the  form 


*r(l)  =  ±A(l)±U>(l-4T,),  (4) 

s.(l)  =  ±S»(1)  ±  A(1  -  4T,),  (5) 


where  =1,2.  Si(l)  and  A(l)  are  orlhonormat  pulses. 

Optimum  demodulation  of  signal  (3)  transmitted  over  the  AWGN  chan¬ 
nel  can  be  done  in  the  standard  way  by  exploiting  the  orthogonality  of  sig¬ 
nals.  Here  we  examine  a  suboptimum  demodulator,  which  exploits  the  special 
structure  of  our  basis  signals.  To  simplify  our  presentation,  consider  only  the 
in-phase  branch  of  the  demodulator,  and  assume  that  noise  is  not  present. 
The  demodulator  outputs  the  signal 

ir(l)  =  ±A(l)±ft(l-4T,).  (6) 

We  observe  that  the  first  term  in  the  right-hand  side  of  (6)  is  non-zero  in  the 
first  half  of  the  ST,  symbol  interval,  while  the  other  term  is  non-zero  in  the 
second  half. 

The  demodulator  structure  can  be  considerably  simplified  by  avoiding  mul¬ 
tiplication  of  the  received  signal  by  Oijit).  Upon  observation  of  signal  (6)  we 
form  the  new  signal 

±A(i)T^2(i-4r,)  (7) 

obtained  by  inverting  the  signal  polarity  in  the  second  ‘■alf  cf  the  observation 
interval.  By  taking  the  sum  o(l)  and  the  difference  5(1)  o<  the  observed 
signal  and  the  signal  with  the  polarity  reversed,  we  obtain  ir(f)  =  ±2^(1)  snd 
S(t)  =  ±20j(t~4Tt).  Thus,  the  detection  problem  is  reduced  to  discriminating 
between  the  two  pulses  /?,(!),  ft(l),  and  the  polarities  of  these  pulses.  This 
can  be  done  by  sampling  twice  in  each  subinterval  of  duration  4T,,  each  time 
comparing  the  sample  value  with  a  suitable  threshold. 

Error  probability  was  simulated  for  transmission  over  a  channel  affected 
by  additive  white  Gaussian  noise  and  intersymbol  interference  and  with  a 
suboptimum  detection  strategy.  Intersymbol  interference  was  modeled  by 
introducing  a  transmitting  filter  Hrif)  and  an  equalizing  filter  /fe(/)  followed 
by  a  receiving  filter  /fjr(/)  at  the  receiver’s  front  end.  JJrlf)  was  selected 
so  as  to  achieve  the  requirements  of  FCC  standards.  The  equivalent  noise 
bandwidth  of  the  overall  filter  is  —  0.333/T,.  The  model  also  includes 
the  effect  of  a  carrier  recovery  circuit  and  of  a  symbol  timing  recovery  unit. 
Fig.  2  shows  that  the  degradation  due  to  intersymbol  interference  is  on  the 
order  of  1  dB  for  a  bit  error  probability  10'*.  The  loss  in  performance  with 
respect  to  orthogonal  eight-dimensional  modulation  over  the  AWGN  channel 
(and  with  perfect  carrier  and  timing  recovery)  is  around  2  dB  for  the  same 
error  probability,  but  the  spectrum  of  8D-4P2C  is  more  compact. 


Figure  2:  Bit  error  probability  of  8D-4P2C  for  additive  white  Gaussian  noise 
and  intersymbol-interference  channel  iS^  =  0.33/T,)  vs.  Et/No.  Perfor¬ 
mance  of  orthogonal  4-  and  8-dimensional  schemes  and  of  QPSK  is  also  shown 
for  comparison. 
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Summary — We  propose  a  new  combined  coded  mod-  In  this  paper  we  ["ove  that  we  can  choose  the  design  pa- 


ulation  construction  which  gives  a  reduced  decoding  com¬ 
plexity.  It  is  a  generalization  of  the  constructions  of 
Ginzburg  [1]  and  Ungerboeck  [2]  and  is  based  on  split¬ 
ting  a  multidimensional  alphabet  with  2^,k  >  2,  sym¬ 
bols  into  k  binary  alphabets.  The  encoder  consists  of  a 
set  of  k  binary  convolutional  encoders,  elemeniary  en¬ 
coders,  operating  at  code  rates  .  .,Rk,  where 

Ai  <  R2  <  ...  <  Rk-  The  data  bits  are  split  into  k 
streams,  each  encoded  by  one  of  the  elementary  encoders. 
The  set  of  k  elementary  encoder  outputs  is  mapped  onto 
the  set  of  2*-ary  modulator  symbols.  The  decoder  con¬ 
sists  of  k  elementary  decoders.  The  decoding  is  performed 
step  by  step  beginning  with  the  first  elementary  decoder, 
then  the  second  etc.  Each  elementary  decoder  uses  infor¬ 
mation  from  the  outputs  of  the  previous  decoders.  The 
code  rate  and  memory  of  each  elementary  encoder  is  cho¬ 
sen  such  that  the  elementary  decoders  have  approximately 
the  same  complexity  and  reliability. 

We  describe  this  method  in  more  detail  for  the  Gaus¬ 
sian  channel  and  4-PSK  with  soft  decisions.  Our  rate  R 
encoder  consists  of  two  (elementary)  parallel  binary  rates 
R\  and  R2  convolutional  encoders,  where  R  =  Ri  +  R2. 
Each  encoder  generates  one  binary  code  symbol  per  time 
unit;  for  the  first  encoder  and  for  the  second 
encoder.  The  pairs  of  code  symbols  are  represented  ■  as 
numbers  j  =  written  in  binary  representation. 

These  numbers  are  mapped  into  modulation  signals 

S;(l)  =  cos(wt  +  ipj), 

where  (fij  =  j‘ir/2. 

The  decoder  consists  of  two  Viterbi  decoders.  The  first 
one,  corresponding  to  the  first  encoder,  operates  without 
taking  the  output  sequence  of  the  second  encoder  into  ac¬ 
count,  i.e.,  it  makes  its  estimates  based  on  the  conditional 
probability  distribution  for  the  received  signal  given  that 
the  code  sequence  uU)  =  . . .)  was  transmitted. 

The  second  Viterbi  decoder  estimates  the  second  code 
sequence  based  not  only  on  the  second  received  sequence 
but  ^dso  on  the  estimated  code  sequence  from  the  first 
Viterbi  decoder. 

If  the  first  decoder  output  is  error  free,  then  the  sec¬ 
ond  decoder  knows  exactly  which  of  the  signal  pairs 
(so(f)> <2(0)  (<i(0<^3(0)  corresponds  to  each 

code  symbol.  Hence,  the  decoding  process  is  reduced  to 
decoding  of  BPSK  signals. 


rameters  of  the  encoders  and  decoders  such  that  for  given 
reliability  and  complexity  the  transmission  rate  R  for  our 
scheme  is  greater  than  that  for  the  conventional  coded 
modulation  scheme.  Estimations  of  the  systems  perfor¬ 
mance  for  4-PSK,  8-PSK,  and  16-PSK  (see  fig.)  with  soft 
decisions  show  coding  gains  of  about  0.5  dB  compared  to 
the  conventional  constructions. 
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This  paper  presents  two  coded  modnlation  schemes  for  achieving 
reliable  data  transmission  over  the  AWGN  and  the  Rayleigh  fading 
channels  with  large  coding  gains,  high  spectral  efficiency,  and  re¬ 
duced  decoding  complexity.  In  the  first  scheme,  coded  modulation 
[1]  is  used  in  conjunction  with  concatenation  [2].  This  combination 
of  coded  modnlation  and  concatenation  is  known  as  concatenated 
coded  modulation.  In  concatenated  coded  modnlation  schemes,  the 
concatenation  can  be  carried  out  either  in  single  or  multiple  levels. 
The  inner  codes  are  bandwidth  efficient  modnlation  codes,  and  the 
outer  codes  are  Beed-Solomon  (RS)  codes.  If  the  inner  codes,  outer 
codes,  and  the  level  of  concatenation  are  properly  chosen,  good  error 
performance  can  be  achieved  with  reduced  decoding  complexity,  high 
spectral  efficiency,  and  large  coding  gain. 

In  a  single-level  concatenated  coded  modulation  scheme,  a  single 
RS  code  is  concatenated  with  a  single  modnlation  code.  In  this 
paper,  several  single-level  concatenated  coded  modulation  schemes 
for  the  AWGN  and  the  Rayleigh  fading  channels  are  proposed.  In 
these  schemes,  both  block  and  trellis  modulation  codes  are  being 
used  as  the  inner  codes  and  they  are  designed  for  either  the  AWGN 
channel  or  the  Rayleigh  fading  channel  and  to  have  simple  decoding 
complexity.  In  a  9-level  concatenated  coded  modulation  scheme,  g 
pairs  of  outer  and  inner  codes  are  used.  RS  codes  with  different  levels 
of  error  correcting  capabilities  are  used  as  outer  codes,  and  coset 
codes  constructed  &om  a  block  modulation  code  and  its  subcodes  are 
used  as  the  inner  codes.  The  encoding  and  decoding  are  accomplished 
in  q  levels  respectively.  The  decoding  at  each  level  consists  of  inner 
and  outer  code  decodings.  Closest  coset  decoding  is  performed  at 
the  first  level  inner  code  decoding  based  on  the  received  sequence 
to  obtain  a  sequence  of  estimated  coset  representatives  for  the  first 
level  inner  code.  This  sequence  of  estimate  coset  representatives  is 
converted  to  RS  code  symbols  and  decoded  based  on  the  first  level  RS 
outer  code.  IVom  the  decoded  RS  symbols,  an  estimated  sequence 
of  coset  representatives  is  formed  and  the  estimates  are  passed  to 
the  second  level  inner  code  decoder  where  the  decoding  process  is 
repeated.  Successive  applications  of  closest  coset  decoding  at  each 
of  the  individual  levels  give  estimates  of  the  coset  representatives  at 
all  the  q  levels.  In  this  paper,  several  multilevel  concatenated  coded 
modulation  schemes  are  proposed  for  the  AWGN  and  the  Rayleigh 
fading  channels,  and  they  adiieve  very  good  performance  and  large 
coding  gains  over  uncoded  reference  systems  of  the  same  spectral 
efficiencies. 

In  the  second  proposed  scheme,  multilevel  coded  modulation  is 
combined  with  multiple  product  codes  to  form  two-dimensional  prod¬ 
uct  modulation  codes.  In  a  product  modulation  code,  the  column 
code  is  a  block  modulation  code  and  algebraic  codes  of  various  er¬ 
ror  correcting  capabilities  are  used  as  the  row  codes.  Methods  for 
constructing  good  product  modulation  codes  for  either  the  AWGN 
channel  or  the  Rayleigh  fading  channel  are  proposed.  A  multi-stage 

decoding  algorithm  for  these  codes  is  devised,  which  reduces  the  de¬ 
coding  complexity  while  achieving  good  error  performance. 

Error  performance  bounds  have  been  derived  for  both  proposed 
schemes,  which  along  with  simulation  results  show  that  they  a^eve 
good  error  performance,  large  coding  gains,  smd  high  spectral  ef¬ 
ficiency  with  reduced  decoding  complexity.  The  proposed  schemes 
outperform  the  ones  available  in  literature  both  in  terms  of  coding 
gain  and  decoding  complexity. 

As  an  example,  consider  a  single-level  concatenated  coded  mod¬ 
ulation  scheme,  in  which  the  outer  code  is  the  NASA  standard  (  255, 
223  )  RS  code  over  GF(2*)  and  the  inner  code  is  a  2  x  2-dimen8ional 
trelUs  8-PSK  modulation  code.  For  inner  code  construction,  we 

'This  resesrek  wu  npported  by  NASA  Grant  NAG  5-931  and  NSF  Gruil 
NCR-91 15400 


choose  the  following  three  binary  codes  :  Aj  =  (2, 1, 2),  A]  = 
(2,2,1),  and  A3  =  (2,2,1).  These  three  binary  codes  are  used 
to  form  a  2  X  2-dimensional  8-PSK  signal  space,  denoted  Ao  = 
M(2>1>2)  *(2,2,1)*(2,2,1)),  which  consists  of  32  signal  pmnts,  each 
signal  point  consists  of  two  8-PSK  signals.  The  intra-set  distance 
of  Ao  is  Il[Ao]  =  1.172.  To  partition  Ao,  we  choose  the  fallowing 
binary  codes:  B,  =  (2,0, 00),  Bj  =  (2,1,2),  and  B3  =  (2,2,1). 
These  binary  codes  are  then  used  to  form  a  snbspace  of  Ao,  denoted 
Ai  =  A((2,0,oo)*(2,  1,2)*(2,2, 1)),  which  consists  of  8  signal  prnnts. 
The  intra-set  distance  of  A]  is  4.  The  coset  code  Ao/Ai  consists  of  4 
cosets,  each  coset  contains  8  signal  points.  A  rate-1/2  convolutional 
code  of  constraint  length  ir  =  3  and  miniinnm  free  branch  distance 
‘^B-free  =  3  is  chosen  for  the  construction  of  the  2  x  2-dimensional 
trellis  8-PSK  code.  This  code  is  generated  by;  G{D)  =  (1  -I-  D^,D) 
and  has  a  4-state  trellis  diagram.  The  schematic  diagram  for  con¬ 
structing  the  desired  2  x  2-dimensional  trellis  8-PSK  code  is  shown  in 
Figure  1.  At  each  time  instant,  4  information  bits  are  encoded  into 
two  8-PSK  signab.  AR  the  possible  signal  sequences  at  the  output  of 
the  overall  encoder  form  a  2  x  2-dimensional  trellis  8-PSK  code.  This 
code  has  a  4-state  trellis  diagram  in  which  two  adjacent  states  are 
connected  by  8  parallel  branches  and  each  branch  corresponds  to 
a  signal  point  in  a  coset  of  Ao/Aj .  The  spectral  efficiency  of  the  code 
is  1;  =  2  bits/signal.  The  minimum  free  squared  Euclidean  distance 
of  the  code  is  4.  Figure  2  shows  the  bit-error-performance  of  the 
overall  concatenated  coded  modulation  scheme. 
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Abstract:  In  this  paper,  a  combination  of  the  tvio  coding  tech¬ 
niques  given  by  Imai-Hirakawa  and  Pottie-Taylor  is  applied  to  the 
ternary  (+,0,-)  line  code  design  problem.  New  ternary  coding  systems 
with  reduced  decoding  complexities  and  improved  error  performance 
compared  to  those  obtained  by  the  classical  Ungerboeck’s  trellis  coding 
approach, are  obtained.  The  decoding  complexity  for  high  coding  rates 
are  reduced  by  the  proper  choice  of  the  punctured  convolutional  compo¬ 
nent  codes  for  each  partitioning  level  A  spectral  null  at  zero  frequency 
is  obtained  by  the  use  of  a  I/i  rate  unit  memory  convolutional  encoder 
at  the  last  partitioning  level  which  selects  line  codewords  with  opposite 
disparities  in  an  alternated  fashion  so  that  the  running  digital  sum 
vcdues  vary  in  a  finite  interval 

Summary 

In  baseband  digital  transmission  systems,  the  signal  to  be  trans¬ 
mitted  must  have  zero  dc  component  and  as  small  as  possible  power 
spectral  components  at  low  frequencies  along  with  having  a  sufficient 
timing  content.  This  avoids  any  dc  power  feeding  over  the  line,  reduces 
the  low-frequency  noise  and  allows  the  extraction  of  clock  informa- 
tion.If  the  baseband  digital  signal  is  transmitted  as  a  binary  unipolar 
sequence,  these  requirements  are  not  satisfied. For  this  purpose,  line 
coding  techniques  are  employed  [I).  A  line  encoder  transforms  the 
binary  sequences  feeded  at  a  rate  A  bit/sec  to  its  input,  into  A'  sym- 
bol/sec  rate  £-level  (L  >  2)  sequences  at  its  output  and  provides  a 
redundancy  of  R'logzL  -  A  in  information  rate. 

The  application  of  the  Ungerboeck’s  [2]  trellis  coding  technique  to 
the  baseband  ternary  (-f-,0,-)  line  code  design  problem  is  realized 
in  [3]  where  the  basic  requirements  for  the  baseband  digital  signal 
transmission  and  the  error  performance  improvement  by  trellis  coding 
are  considered  as  an  entity  during  the  design  phase.  For  rates  A  = 
n/n  +  \  (n  =  1,2,3)  ternary  line  encoders  are  designed  based  on 
2’''*'^-element  alphabet  which  consists  of  codewords  with  0,-f  1  and  -1 
polarities  and  a  proposed  codeword  assignment  model.  Coding  gains 
of  3  -  3.52dA  are  obtained  with  respect  to  the  classical  paired-selected 
ternary  line  code. 

In  this  paper,  based  on  the  multilevel  coding  approach,  some  new 
ternary  line  encoders  with  lower  decoding  complexity  and  improved 
coding  gains,compared  to  those  obtained  by  the  Ungerboeck’s  tech¬ 
nique,  are  proposed.  The  multilevel  coding  scheme  given  first  by  Imai 
and  Hiralcava[4]  employes  at  each  signalling  interval,  one  output  bit  of 
each  of  several  binary  error  control  encoders  to  construct  the  signal  to 
be  transmitted.  An  important  advantage  of  the  multilevel  coding  is 
the  possibility  of  suboptimum  multistage  decoding  of  each  code  with 
decoded  information  transferred  from  one  stage  to  the  next.  This  al¬ 
lows  to  reduce  the  decoding  complexity  at  each  stage  and  therefore  for 
the  overall  system.  Pottie  and  Taylor(5]  have  presented  a  generalized 
version  of  the  multilevel  coding  technique  where  m  output  bits  of  each 
Ai  =  ki/m  rate  component  encoder  are  used  to  partition  the  signal 
subsets  determined  at  the  preceding  stage  into  2"'  new  subsets  with 
fewer  signals.  In  our  work,  we  use  a  combination  of  these  techniques 
to  obtain  increasing  minimum  subset  distances  Ao  <  Ai  <  •  ■  ■  Am-i 
given  by  the  set  partitioning  method.  Here,  M  represents  the  number 
of  coding  levels.  Our  aim  is  to  form  the  optimal  ternary  codeword 
alphabets  for  several  codeword  lengths  in  order  to  achieve  asymptotic 
coding  gains  at  the  lowest  possible  decoding  complexity. 


The  first  step  of  the  multilevel  line  code  design  procedure  is  to 
construct  an  alphabet  consisting  of  2’*'*''  maximally  distinct  ternary 
(-P,0,  — )  n-length  codewords.  For  this  purpose,  we  use  ternary  se¬ 
quences  with  only  0,  ±2  disparities  (word  digital  sums).  Half  of  the 
2"'*'*  codewords  are  chosen  having  zero  disparity  and  the  rest  having 
equal  number  of  positive  (-1-2)  and  negative  (-2)  disparities,  the  code¬ 
word  alphabet  So  is  then  divided  into  M-levd  nonoverl^ping  subsets 
to  form  a  partition  chain  So/ Si/  ■  ■  •  /Su-i  with  minimum  subset  dis¬ 
tances  Ao  <  A]  <  ■■■  Aa/-i,  respectively.  To  each  psirtitioning  level 
Si-i/Si,i  =  -  2  a  binary  component  code  is  associated 

with  free  Hamming  distance  d}},  related  to  the  free  Euclidean  distance 
d/ED  of  Ihe  overall  system  by 

where  dju-i  is  the  free  Euclidean  distance  of  the  encoder  correspond¬ 
ing  to  the  last  partitioning  level  M  At  the  partitioning  levels  t  = 
1,2,...,  Af -2,  we  use  punctured  convolutional  codes  Ci,Cj, . . .  ,Cm_2 
due  to  their  relatively  lower  decoding  complexities.  At  the  last  level 
M  —  I  where  the  subsets  including  only  one  ternary  codeword  are  ob¬ 
tained,  we  take  the  basic  line  requirements  into  account  and  use  in  all 
cases  a  1/2  rate  unit-memory  encoder  which  chooses  line  codewords 
with  -f2  and  -2  disparities  in  an  alternated  fashion.  Thus,  we  restrict 
the  values  of  the  running  digital  sum(RDS)  at  each  line  coding  step  to 
-1-2,0  and  —2.  Note  that,  at  each  signalling  interval,  the  two  input  bits 
of  this  encoder  are  used  to  determine  a  ternary  line  codeword.  There¬ 
fore,  the  linear  relation  between  Hamming  and  Euclidean  distances 
are  not  valid  for  this  encoding  level.  The  use  of  this  1/2  rate  encoder 
results  in  some  slight  losses  in  the  number  of  data  bits  transmitted  per 
ternary  codeword,  to  obtain  systems  with  good  error  and  complexity 
properties,  compared  to  the  encoders  given  in  [3]  using  Ungerboeck’s 
approach.  Thus,  for  the  sake  of  comparison,  we  use  the  asymptotic 
coding  gain(ACG)  defined  as, 

where  Ru  and  Am  represent  the  coding  rates( number  of  data  bits  per 
transmitted  ternary  codeword)  of  Ungerboeck’s  type  and  multilevel 
ternary  line  codes,  respectively.  E,  is  the  average  codeword  energy. 
ACGs  up  to  2.5dS  are  obtained  with  significantly  reduced  decoding 
complexities. 
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Abstract 

The  trellis  constnictioo  methods  of  Wolf  (1],  Massey  [2],  and 
Fomey  [3]  for  general  linear  Mock  codes  are  briefly  reviewed.  An 
isomorphism  between  a  trellis  constnicted  using  Massey’s  method 
and  one  constnicted  using  Wolfs  method  is  derived.  It  is  confirmed 
diat  Wolfs  and  Massey’s  trellis  constnictions  also  yield  minimal 
trellises.  IWo  simplified  methods  for  minimal  trellis  constniction  are 
presented,  along  widi  a  mediod  to  calculate  the  trellis  dimensions 
that  is  an  alternative  to  the  methods  of  [2]  and  [3].  An  inqirovement 
is  found  to  a  lower  bound  on  the  maximum  trellis  dimension  due  to 
Muder[4].  It  is  shown  diat  when  equivalent  codes  are  constructed  by 
permutarions  of  the  symbol  positions  die  resulting  trellis  dimeiuions 
are  fixed  near  either  end,  while  in  the  central  portion  of  the  trellis 
the  dimensions  vary  between  an  attainable  upper  bound  and  a  lower 
bound.  I¥om  dre  lower  bound  on  the  trellis  dimensions  in  the  central 
portion  of  the  trellis  it  is  seen  that  only  codes  (and  their  duals)  that 
meet  a  certain  condition  on  their  minimum  distances  can  possibly 
have  •  irellia  with  a  relatively  small  number  of  states. 

Summary 

Compared  to  convolutional  codes,  the  trellis  representations  of 
linear  block  codes  have  been  discussed  much  less  frequently,  with 
only  a  few  pqten  appearing  within  twenty  years  of  the  introduction 
by  Fomey  of  the  convolutional  code  trellis.  Of  these  papers,  the 
introduction  of  linear  block  code  trellises  appears  in  [1](2](S].  Re¬ 
cently,  in  [3]  and  [4],  trellis  construction  and  the  trellis  state-space 
dimensions  were  re-examined.  Here,  we  continue  this  examination  of 
trellis  construction  and  dimensionality  for  general  linear  block  codes. 

Wolfs  trellis  cotutruction  for  a  general  linear  (n,  k)  block  code 
C  begins  by  generating  an  uitexpurgaied  trellis  that  represents  all 
utreoded  sequences  of  length  n.  The  trellis  states  are  taken  to  be  a 
partially  formed  syndrome  veaor.  The  code  trellis  is  then  fomwd 
by  expurgating  all  padis  that  do  not  lead  to  the  zero  state  at  deptii  n. 
Massey’s  trellis  construction  for  a  general  linear  block  code  assigru 
a  state  at  depth  /  in  the  trellis  to  be  the  veaor  of  parity  symbols 
in  die  codeword  tail,  as  determined  by  the  information  symbols  in 
the  codeword  head.*  An  isomorphism  between  a  trellis  construaed 
using  Massey’s  method  and  one  constructed  using  Wolfs  method  is 
found  by  comparing  the  state  assignment  equations.  To  show  that 
these  methods  produce  minimal  trellises,  we  use  a  condition  from  [4] 
that  specifies  that  a  state  at  depth  /  must  be  assigned  to  those  heads 
of  the  code  tree  f  that  satisfy  the  equivalence  relation 

where  indicates  that  c*  aid  o'*  share  the  same  sa  of  tails. 


'  Tha  S«aS  e*  aad  die  Mi/  c‘  of  «  codemcid  c  €  C  «a  die  Sm  /  tymbole,  end  die  leei 
n  - 1  eyndiab  ot  e.  raepeedveljr.  es  diet  c  >  (c*,  c'). 

TMe  eok  wee  enpyoned  in  pen  bjr  i  Newel  Sciencce  end  Ensr.eeeiat  Reeeeech 
Conncil  (NSBtO  SckolenMp.  a  B.C.  SetaKe  Owica  0.1I.£A.T.  Aiwd.  a  B.C  Advamad 
SjiaamB  laadwe  Sehatanhip.  mi  tj  NSERC  Onni  OCPOOOI73I. 


Based  on  Wolfs  and  Massey's  trellis  state  assignmem  methods, 
we  present  two  simplified  methods  for  constructing  minimal  trellises. 
The  simplified  trellis  constniction  methods  avoid  the  generation  of 
an  unexpurgaied  trellis  followed  by  expurgation  of  non-codeword 
tails,  as  used  in  Wolfs  method  for  general  linear  block  codes.  Both 
mediods  also  avoid  matrix  multiplications  at  each  extension  of  a  head 
to  form  the  trellis,  as  used  in  Massey’s  method  for  non-sysiematic 
linear  block  codes  or  in  Forney’s  method  for  general  linear  block 
codes.  The  new  methods  should  be  useful  for  complete  trellis 
construction  and  for  reduced-state  tieeArellis  searches. 

A  method  for  calculating  the  trellis  dimensions  is  presented  that 
is  an  alternative  to  those  of  [2]  or  [3].  Using  this  method,  an 
improvement  is  found  to  Muder’s  lower  bounds  on  the  maximum 
trellis  dimension  (denoted  s)  for  linear  block  codes.  Muder’s  lower 
bounds  can  be  summarized  into  a  single  expression, 

s  >  min  -  1  -  A,dii,  -  1  -  A-*-) 

where  dmm  and  are  the  minimum  distance  of  C  and  its  dual 
C-*-,  respectively;  and  where  A  A  n  -  k  -  (d«i,  -  1)  and  A-*-  = 
k  -  -  l).  The  improved  lower  bound  is 

s  >  min  (dtnin  -  l,<ti;«  -  l). 

It  is  also  shown  that  the  trellis  dimensions  remain  fixed  near  either 
end  of  the  trellis  despite  symbol  position  permutations,  and  that  a 
lower  bound  on  the  minimum  trellis  dimensian  in  the  central  pwtion 
of  the  trellis  (denoted  is  given  by 

s'  >  max  [o,min  -  1  -  A.dij,  -  1  -  A-*-)]. 

This  establishes  that  only  codes  (and  their  duals)  that  have  a  small¬ 
est  minimum  distance  min  (dmim.djii,)  significantly  less  than  the 
corresponding  Singleton  bound  can  possibly  have  a  trellis  with  few 
states  relative  to  min  (q*,?""*). 
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1  Introductioii 

Recemly,  many  research  works  have  been  conducted  to  look  for  t  (symmetric)  error 
cotrecling  and  all  unidirectional  error  detecting  (t-EC/AUED)  codes  based  on  t-EC 
codesll-S].  In  this  paper,  we  propose  a  new  construction  method  for  a  t-EC/AUED 
code  which  is  constructed  by  appending  redundant  bits,  which  consist  of  two  parts, 
to  the  codewords  of  a  base  t-EC  code.  The  first  part  of  redundant  bits  is  constructed 
systematically  and  the  secotul  by  searching  algorithms.  The  mapping  firam  the  code¬ 
words  of  a  t-EC  base  code  to  the  two  parts  of  redundant  bits  is  very  simple  and 
determined  by  the  Hatiuning  weights  of  codewords  of  the  base  code.  It  is  shown 
that  the  proposed  t-EC/AUED  codes  achieve  higher  information  rates  and  require 
smaller  encoding  tables  compared  to  the  conventional  ones. 

We  denote  the  number  of  1  ->  0  crossovers  from  X  =  (zo,Z| . Zn-i)  to 

y  =  (lib,»i,..->»n-iX*i,yi  €  {0,1})  by /V(X,K),  the  Hammittg  weight  of  X 
byW(X).  ForX  andZ  s(zo,zi,...,zm-.i),wedenotetheconcalenationofX  and 
Zby  XZ  =  (zi,zt,...,z„_i,za,zi,zm-i).  We  also  denote  the  cardinality  of  a  set 
S  by  |S|,  the  least  integer  not  less  than  a  by  ful  and  the  greatest  integer  not  greater 
than  a  by  [aj. 

In  die  following,  the  t-EC/AUED  codes  proposed  in  [1]  and  [2]  are  referred  to  as 
code  Co  and  C[,  respectively. 


2  Code  Construction 

We  denote  an  (n,k  214-1)  base  code  by  D  and  define  wm  •  max{W(X)  |  X  €  D], 
Let  A(m)  =  {  aJ"  ,  ....  }  be  a  set  of  binary  m-tuples  and  t)  = 

{Bo"*,  B^"*', ....  B^’}  (N  >  [iBs# /mj )  a  set  of  binary  6-tuples  satisfying 

/V(Aj»\Af»*)  >  max{r(,-p)/2,l,0}  (I) 

/VfBS"',  B*™*)  >  min{f  -t-  l,m(»  -  i»)},  for  u  <  ».  (2) 

Then,  the  proposed  t-EC/AUED  code  Cm  is  defined  by 

Cm  =  {XA^"»BS")  I  X  €  Z).  Aj">  €  A(m),  Bj"*  6  B„(b,  t)},  (3) 

where 


tV(X)  =  2mu -r p,  Q<p<  2m. 
Theorem  1  Code  Cm  is  a  t-EC/AUEO  code. 


(4) 

f 


Though  a  construction  method  for  A(m)  is  already  given  in  [  I  ],  we  propose  a  new 
systematic  construction  method  for  A(m),  which  introduces  a  hierarchy  into  the  class 
of  proposed  codes  {Cm }.  (see  Theorem  3  below.) 

Lemma  1  Define  A(7n)  =  {aJ”*,  A*,"**, . . . ,  a£J^’_, }  by 

A<”  =  (1),  A''>=.(0)  (5) 


l(m) 


4(m)  4(m)  4(m)  xT 

^2m-3  ^2m-2  ^m-1 


=  U"'"  - 

Then,  A(m)  satisfies  Eq.(l). 


,(m-l) 

’Jm-3 


,(m-l) 

''Sm-4 


Aj" 


0  Y 

m-l)  I  . 
m-3  / 


m  =  2, 3 . (6) 

1 


It  is  shown  that  the  proposed  code  has  following  properties. 

Theorem  2  For  given  Bi(6,  t),  if  there  exists  an  (n  4- 1  4-  6,  k)  t-EC/AUED  code 
C{[2],  we  can  construct  an  (n  4- 1  4-  6,  k)  t-EC/AUED  code  Ci  by  using  the  same 
Bi  (6,  t).  The  converse  is  not  always  the  case.  T 

Theorems  Cm(m  >  1)  can  be  regarded  as  C|  or  Co.  5 

Theorem  3  states  that  the  information  rate  of  Co  is  not  less  than  that  of  Cm(m  >  1) 
and  the  information  rate  of  Ci  is  not  less  than  that  of  C„{m  >  2).  However,  it  is 
easy  to  realize  that  encoding  the  proposed  code  Cm(m>  1 )  requires  tables  of  sizes 
{(n  4-  l)/2ml  and  2m  for  tiuqiping  from  u  to  Bi"*’  and  from  p  to  Aj"*,  while  Co 
and  C{  require  tables  of  sizes  n  and  [(n  4-  I)/2J,  respectively.  Therefore  the  size  of 
encoding  table  for  the  proposed  code  C„(m  >  I)  is  about  l/2m  of  that  for  Co  and 
about  I/m  of  that  for  C[,  if  n  is  large. 

3  Construction  of  B„(b,  f ) 

In  order  to  obtain  efficient  codes  by  the  proposed  method,  it  is  very  important  to 
prepare  Bm(6.  ()  which  has  as  many  elements  as  possible  for  fixed  6  and  t.  For 
lack  of  space,  we  only  show  three  algorithms  to  construct  B„(b,  t)  among  six  al¬ 
gorithms  we  consirlered.  Algoritlim  1  is  obtained  by  extending  the  algorithm  for 
B|(6,  t)  given  in  [3]  to  Bm(6,f)  and  Algorithms  2  by  modifying  Algorithm  I.  In 

Algorithm  1,  we  denote  by  {X„.r  =  (zl,’'’,z5’'',...,i*j’2,)}(r  «  1,2 . (t))the 

set  of  binary  6-tuples  of  Hamming  weight  w  and  numbered  with  r  according  to 

4-1  *-l 


the  rule  that  r I  <  r2  iff^xp‘2'  <  y^zi"**2V  Algorithm  I  examine  if  X«,,r 


taO  iaO 

can  be  an  elemem  of  B„(b,  t)  by  the  ftinction  TEST.  TEST(X„,r)  outputs  true  if 
{Bi"*,  B}"”,  ....  B<."»,  B‘”>  M  X„,r)  satisfies  Eq.(2.6)  in  [I]  for  m  »  0  and  Eq.(2) 
form  >  I, otherwise outputs/ilre. 


Algortthm  1 
beghi 

Bj”*  ;=(H...l):  u;=0: 

for  tv  6  -  1  dowBlD  0  do 


for  r 


do 


if  TESr(X„,r)  -  true  Umii 
begiB 

u  ;«  u  4. 1;  Bi"**  ;=  X„.r 

end; 

B„(6,f):={B<’">,B<’"> . BS"'} 

CtMi 

Algorithm  2  Modify  Algorithm  I  so  as  to  search  {X„,r}  fromto  =  I  tow  b  6  with 
the  initial  condition  B*,”**  =  (00 ...  0). 

Algorithm  3  For  a  given  Bm(b.  t).  construct  Bo(64'm,  t)  by  concatenating  Bm(6,  t) 
with  A(m)  and  construct  B|(6  4-  m  -  1,  t)  by  concatenating  Bm(b.  t)  with  the  first 
m-l  bits  of  A(m).  5 

As  an  example,  we  show  in  Table  1  the  largest  values  of  |Bd(6,  t)|  among  those 
obtained  by  Algorithms  1  through  3  for  several  values  of  6  and  Lin  comparison  with 
the  best  known  results  so  far. 

We  can  see  frt>m  Ihble  1  that  many  better  results  are  obtained  by  the  newly  pro¬ 
posed  algorithms. 

4  Perfonnance  of  the  Proposed  Codes 

As  an  example,  we  show  in  TbUe  2  the  number  of  additional  bits  requited  to  con¬ 
struct  SEC/AUED  codes  from  single  error  conecting  base  codes  by  the  proposed 
method  in  comparison  to  the  best  known  results  so  fiull-5)(shown  in  the  coimm 
old).  From  Table  2,  we  can  see  that  the  proposed  codes  need  less  additional  bits  than 
die  conventional  ones. 

■fable  1:  la>(6,t)| 


t  =  1 

t»2 

t «  3 

t*4 
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2 
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2 
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4 
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6 

4 
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12 
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9 
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18 

10 
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11 
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’32(24) 

12 
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’94(72) 

|‘X32) 
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13 

— 
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•43(32) 

14 

— 
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•54(36) 

15 
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’396(392) 

’112(72) 
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17 
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19 

20 
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igrvVtl 

’212(180)  ’I02a2) 

’298(264) ’124(104) 
’496(488) ’168(156) 

-  216 

_  'JBfl 

’374(368) 


without  mark  are  the  results  due 

■The  new  values  obtained  by  the  pro¬ 
posed  algorithms  are  matketfby  a  num¬ 
ber  at  the  upper  left  comer  which  indi¬ 
cates  the  algorithm  used. 

The  bestjuiown  results  are  also  shown 
in  parentheses. 
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Summary 


An  [n.it]  tEC/AUED  code  is  a  tEC/AUED  code 
which  is  systematic  on  the  first  k  positions.  For  an 
[n,ib]  tEC/AUED  code  C,  when  k  is  fixed  we  want  to 
minimize  the  length  n.  For  fixed  k,  the  lower  and  upper 
bounds  for  the  minimized  length  n  have  been  discussed 
by  many  researchers.  In  this  research,  we  improve  nu¬ 
merous  existing  lower  and  upper  bounds.  We  first  dis¬ 
cuss  the  method  to  improve  the  lower  bounds. 

Let  A?(a,  b)  =  |{1  <  i  <  n  :  a;  =  1  A  6i  =  0}|, 
then  it  is  well-known  that  a  code  C  is  tEC/AUED  iff 
Vabjca^b  >  <  -I-  1].  This  implies  that  a 

OEC/a'UED  code  is  an  antichain  and  a  tEC/AUED  with 
t  >  0  is  a  special  antichain.  Therefore,  the  well-known 
LYM  inequality  can  be  applied  to  these  codes.  Because 
the  requirement  for  tEC/AUED  codes  is  much  stronger 
than  the  one  for  general  antichains,  we  sharpen  the  LYM 
inequality  to  the  so-called  weak  and  strong  LYM  in¬ 
equalities  as  follows. 

Define 


Mn(m,t,i)  = 


m  +  t  —  2i  \  / 
t-2i  +  i  )\ 


n 


m  -  t  +  2i  \ 

J  J' 


Mn(m,  t,  0)  =  M„(m,  t,  0) , 

M„(m,  t,  i)  =  moi{  A<n(m,  t,  i  -  1),  A<„(m,  t,  «)},»>  0. 


Let  C  be  a  tEC/AUED  code,  then  we  have  the  fol¬ 
lowing  results. 

Weak  LYM  inequality  for  tEC/AUED  codes:  For 
each  i  with  0  <  i  <  t. 


<  1 . 


Next,  we  have  new  constructions  for  [n,k]  tEC/AU 
ED  codes.  Currently  the  best  known  constructions  for 
the  [n,  k]  tEC/AUED  codes  are  based  on  the  “descend¬ 
ing  tail  matrices^’  [1].  For  each  input  message  sequence, 
T]  parity  check  bits  are  appended  to  make  it  a  codeword 
in  a  t  error  correcting  linear  code,  then  a  descending  tail 
is  appended  to  code  the  weights  of  the  codewords. 

Our  idea  is  to  find  the  inherent  relations  among  the 
message  sequences  and  design  the  tail  part  as  a  whole. 
Two  methods  (the  group  theoretic  method  and  the  lin¬ 
ear  code  syndrome  method)  are  found  to  efficiently  di¬ 
vide  the  message  sequences  into  “subcodes”,  and  then 
the  tail  words  are  used  to  code  the  indecies  of  the  sub¬ 
codes  as  well  as  the  weights  of  the  codewords.  These 
new  constructions  have  complexities  that  are  compara¬ 
ble  to  codes  constructed  by  descending  tail  method.  For 
more  details,  one  can  see  [4].  The  following  table  shows 
a  part  of  the  new  lower  and  upper  bounds. _ 


r  : 

r 

r 

4 

6 

4 

8 

5 

6-7 

5-6 

9-12 

6 

6-8 

7 

9-13(15) 

7 

6-8(9) 

8 

9-15 

8 

7-8(9) 

9 

(9)10-15 

9 

7-8(10) 

10-11 

10-16(17) 

10-11 

7-9(10) 

12-13 

(10)11-16(17) 

12-14 

(7)8-10(11) 

14 

11-16(18) 

15 

8-10(12) 

15 

11-19 

16-17 

8-11(12) 

16-18 

(11)12-19 

18-19 

(8)9-11(12) 

19-21 

12-19 

20-25 

9-12 

22 

(12)13-19 

26 

(9)10-12 

23-25 

(12)13-20 

27-28 

(9)10-13 

26 

(12)13-20(21) 

29-30 

(9)10-13(14) 

27-28 

13-20(21) 

31 

10-13(14) 

29-30 

(13)14-20(21) 

31 

(13)14-20(22) 
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Strong  LYM  inequality  for  fEC/AUED  codes: 
For  each  i  with  0  <  i  <  f. 


E 

C€C 


A<n(|c|,t,l) 

(w) 


<  1- 


By  appropriately  applying  the  above  two  inequalities 
to  [n,k]  fEC/AUED  codes,  most  existing  lower  bounds 
are  improved  by  1  bit  and  many  of  them  are  improved 
by  2  bits.  For  details,  one  can  see  [1,2,3]. 
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Considet  the  following  communication  scheme  [1,  2]:  a  binary  vec¬ 
tor  of  length  n  is  transmitted  using  »  parallel  wires.  Each  wire 
represents  a  coordinate  of  the  vector.  The  propagation  delay  in 
the  wires  varies.  Arrival  of  a  transition  represents  a  1  while  ab¬ 
sence  of  a  transition  represents  a  0.  The  problem  is  to  find  an 
efficient  communication  scheme  that  will  be  delay-insensitive. 

Let  us  represent  the  tracks  with  the  numbers  1,2, ...,n.  After 
the  m-th  transition  has  arrived,  the  receiver  obtains  a  sequence 
Xm  =  *1)  2^2,  •  ■  • ,  ®mi  where  1  <  <  n,  and  x,  represents  the 

fact  that  the  i-th  transition  was  received  at  the  z,-th  wire.  The 
set  {z),  Z], . . . ,  Zm}  is  the  support  (i.e.,  the  set  of  non-zero  coor¬ 
dinates)  of  a  vector  and  determines  uniquely  a  binary  vector. 
Verhoeff  [2]  studied  the  following  problem:  assuming  that  a  vector 
X  is  transmitted,  once  reception  has  been  completed,  the  receiver 
acknowledges  receipt  of  the  message.  The  next  message  is  sent 
by  the  sender  only  after  the  receipt  of  the  acknowledgement.  The 
problem  is  finding  a  code  C  whose  elements  are  messages  such  that 
the  receiver  can  identify  when  transmission  has  been  completed. 
It  is  easy  to  see,  as  proved  in  [2],  that  the  codes  having  the  right 
property  are  the  so  called  unordered  codes,  i.e.,  all  its  elements 
are  unordcred  vectors. 

Here,  we  assume  that  there  is  no  communication  between  receiver 
and  sender  between  messages,  except,  perhaps,  when  errors  are 
detected.  The  sender  does  not  wait  for  acknowledgement  before 
sending  the  next  message.  This  makes  transmission  faster,  since 
the  waiting  period  between  messages  gets  shorter.  However,  if 
we  shorten  the  waiting  period,  transitions  from  Y  might  start  to 
arrive  before  reception  of  X  has  been  completed,  a  condition  called 
skew. 

In  [1],  coding  strategies  were  studied  that  allow  either  detection 
or  correction  of  skew  between  consecutive  messages.  Here,  our 
aim  is  to  study  codes  that  can  correct  a  certain  amount  of  skew 
between  messages,  and  detect  an  extra  amount  of  skew  when  the 
skew  correcting  capability  of  the  code  has  been  exceeded.  We 
generalize  the  results  in  [1]. 

Consider  a  transmitted  vector  X  followed  by  some  other  vectors, 
giving  a  received  sequence  Z.  There  are  two  parameters  that  are 
related  to  the  skew.  The  first  one,  denoted  m{X;  Z),  denotes  the 
index  of  the  last  transition  in  X  before  the  occurrence  of  skew, 

i.e.,  the  last  transition  in  X  before  the  arrival  of  either  a  transition 
not  in  A  or  a  repeated  arrival.  The  second  one,  denoted  r{X;Z), 
denotes  the  index  of  the  last  arrival  in  A.  If  there  is  no  skew, 
m(A;  Z)  =  r(A;Z).  We  we  ready  now  to  define  ihe  concept  of 
skew  of  a  vector  A  with  respect  to  a  sequence  Z. 

Definition  1  Let  A  be  a  subset  of  {1, 2, ... ,  r>}  (equivalently,  .Y 
is  a  binary  vector  of  length  n).  Let  Z  =  xj,  xj, . . x,, . . .  be  a 
sequence  whose  elements  are  in  (1,2,  ...,n},  =  Z|,Z], . . . ,  r, 

and  Zi  the  set  corresponding  to  Z,.  Let  m  =  m(X;  Z)  and  r  = 
r(A;Z)  be  as  defined  above.  We  say  that  the  skew  of  A  with 
respect  to  Z  is  equal  to  {hJt)  (notation,  5(A;  Z)  =  {li.l})),  if  and 
only  if 

Ij  =  |(Z,  —  Z„)  n  A|  and  Ij  =  r  -  m  -  i| . 


Let  5(A;  Z)  =  (fi,  Ij).  We  say  that  5(A;  Z)  does  not  exceed  (.ii ,  .^j), 
denoted  5(A;  Z)  <  (si,  Sj),  if  h  <  «i  and  <2  <  *2-  Otherwise,  we 
say  that  5(A;  Z)  exceeds  (51,53)  (notation,  5(A;  Z)  >  (s,.  .ij)). 

Definition  2  Let  <1,  <2.  *11*2  be  4  non-negative  parameters  and 
let  C  be  a  code.  We  say  that  C  is  (ti, <2)-skew-tolersnt  (ST) 

(U  +3i,h  +  S3)-skew-detecting  (SD)  if,  whenever  a  codeword  .Y 
in  C  is  transmitted  followed  by  other  codewords  giving  a  received 
sequence  Z,  then,  by  examining  Z,  the  code  will  correctly  decode 
A  provided  that  (0,0)  <  5(A;Z)  <  (<1,12)  and  will  detect  the 
occurrence  of  skew  when  ((1,  <2)  <  S(X;  Z)  <  (ti  +  +  53). 

Notice  that  a  (<i,t3)-ST  code  is  a  (<i,f3)-ST  (/i  +  .r3)-SD 

code  such  that  51=53  =  0,  and  an  (51, 53)-SD  code  is  a  (/|,  f2)-ST 
(li  -b  5i,  <3  -f  53)-SD  code  such  that  <1  =  <3  =  0  (compare  with  the 
definition  of  error  correcting/detecting  codes  that  can  correct  up 
to  i  errors  and  detect  up  to  <  -t-  5  errors). 

Given  two  binary  vectors  A  and  Y  of  length  n,  we  denote  by 
JV(A,  y)  the  number  of  coordinates  in  which  A  is  1  and  Y  is  0. 
The  following  is  our  main  result: 

Theorem  1  LetCbeacodeandlett  =  min{<],<2},  7’ =  max{/i,/3}, 
5  =  min{5i,53},  S  =  max{5i,53},  r  =  min{t|  -si./j  +  .sa}  and 
p  =  max{ti  -f  5i ,  ^3  +  53  }.  Then,  C  is  (t|,  t3)-ST  (/|  -f-  si ,  /a  -b  .53)-SD 
if  and  only  if,  for  any  two  distinct  codewords  A  and  >’  in  (^  such 
that  N{X,  Y)  <  N{Y,  A),  the  following  is  true: 

(a)  If  (<]  —  t3)(si  —  53)  >  0,  then  at  least  one  of  the  following  3 
conditions  occurs: 

1.  JV(X,Y)  >T+1. 

2.  iV(A,y)  >  7  +  1  and  A(y,  A)  >  p-b  1. 

3.  ^(A,y)>landAf(y,A)><,-b<2-b.'?-bl. 

(b)  If  (<i  -  l2)(5i  —  S3)  <  0,  then  at  least  one  of  the  following  4 
conditions  occurs: 

1.  yV(A,y)  >  r-bl. 

2.  A(A,  y)  >  T  -b  1  and  JV(y,  A)  >  p  -b  1. 

3.  N(X,  y)  >  t  -b  1  and  ^(y,  A)  >  t,  -b  f,  +  s  -b  1. 

4.  yV(A, y)  >  1  and  X(Y,  X)>f.,  +  i3  +  S+  1. 

We  will  prove  that  the  conditions  are  sufficient  by  giving  a  deco<l- 
ing  algorithm  and  we  also  present  codes  satisfying  the  conditions. 
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In  [4],  &  coding  solution  to  the  problem  of  parallel  asynchronous 
communications  was  presented.  After  transmission  of  each  co<le- 
word,  the  receiver  acknowledges  reception  of  the  message  through 
a  handshake  mechanism.  In  this  way,  skew  between  messages  is 
avoided.  From  a  coding  point  of  view,  the  problem  is  identify¬ 
ing  the  end  of  a  message.  As  pointed  out  in  [4],  the  codes  that 
accomplish  this  task  are  the  so  called  unordered  codes. 

A  more  complicated  coding  situation  occurs  when  acknowledge¬ 
ment  of  the  message  is  not  allowed.  In  principle,  this  is  an  attrac¬ 
tive  alternative,  since  it  would  allow  pipelined  utilization  of  the 
channel,  with  increased  data  throughput.  However,  the  difficulty 
now  is  that  there  might  be  skew  between  messages,  i.c.,  signals 
from  a  second  transmitted  vector  may  arrive  before  the  current 
vector  has  been  completely  received. 

Necessary  and  sufficient  conditions  for  codes  that  can  either  detect 
or  correct  a  certain  amount  of  skew  were  given  in  [1].  For  further 
motivation  and  description  of  the  problem,  the  reader  is  referred 
to  [1,  4].  Here,  we  present  constructions  of  codes  that  can  detect 
or  tolerate  skew  below  a  certain  threshold. 

Given  two  binary  vectors  X  and  Y  of  length  n,  we  denote  by 
N{X,  y)  the  number  of  coordinates  in  which  A  is  1  and  Y  is  0. 
In  (1)  theorems  that  characterize  {/i,  <3)-skew-detecting  and  skew- 
tolerant  codes  were  proven.  Here  we  present  the  theorems  in  the 
form  of  a  definition  as  follows; 

Definition  1  Let  and  <3  be  two  non-negative  integers,  and  let 
t  =  niin{f I, <3}  and  T  =  max{t|,  <3}.  We  say  that  a  binary  cmie  6’ 
of  length  7]  is: 

!•  (fi, t3)‘Skew-detecting  (SD)  if  and  only  if,  for  any  pair  of 
distinct  codewords  X,Y  €  C,  at  least  one  of  the  following 
two  conditions  occurs: 

(a)  min{  Af{A-,  Y),  N(Y,  A)}  >  <  -(■  1 
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or 

(b)  min{^{A,y),^(y,A)}  >  1  and 
max{yV(X,  y),  N(y,  X)]>T+  1. 

2.  (t],f3)-skew-toIerant  (ST)  if  and  only  if,  for  any  pair  of  dis¬ 
tinct  codewords  X,Y  €  C,  at  least  one  of  the  following  two 
conditions  occurs: 

(a)  min{fV(A,y),  N(y,  A)}  >  <  +  1 

or 

(b)  imn{N{X,  y),  N{Y,  X)}  >  1  and 
mHx{N(X,  y),  N[Y,  X)}  >  <1  -I-  <3  +  1. 

We  present  a  general  method  for  constructing  (<|,  f3)-SD  and  ST 
codes.  The  procedure  involves  adding  three  tails  to  the  infor¬ 
mation  bits:  the  first  tail  encodes  the  information  bits  into  an 
(n',  k,  21]  +2)  error-correcting  code;  the  second  tail  makes  the  code 
satisfy  the  conditions  in  Definition  1;  the  third  tail  merely  unorders 
the  code  in  a  way  analogous  to  the  generalization  of  Berger’s  con¬ 
struction  given  in  [1].  We  also  briefly  discuss  optimality  issues  of 
the  constructions.  More  details  can  be  found  in  [2] 
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Binary  superimposed  codes  are  considered.  The  super¬ 
position  mechanism  assumed  is  addition  modulo-2.  Various 
constructions  and  bounds  are  derived. 

Introduction 

The  idea  of  superimposed  codes  was  introduced  in  1964  by 
Kautz-Singleton  [1].  The  application  they  had  in  mittd  was  informa¬ 
tion  retrieval  and  the  superposition  mechanism  assumed  was 
Bodean  sum.  Chien-Frazer  [2]  considered  the  same  problem 
assuming  roodulo-2  addition  as  the  superposition  mechanism.  Later 
authors  have  usually  emphasized  the  application  of  superimposed 
codes  in  multiple-access  communication  [3],[4].  In  the  present 
investigation  we  adhere  to  that  view  while  adopting  the  same 
superposition  mechanism  assumed  by  Chien-Frazer:  addition 
m^ulo-2. 


The  problem 


Doubly  perfect  codes 

Let  d  be  odd,  d=2t-t-l.  It  is  clear  that  is  subject  to  any 
upper  bound  for  binary  codes  with  distance  d=2t-t-l .  Therefore  the 
parameters  (n,m,d,T)  of  a  superimposed  code  C  are  subject  to  the 
following  obvious  bounds, 


I 


t=0 


(T)sA(n.2t+l)S 

xo 


We  say  that  a  code  C  satisfying  the  first  bound  is  perfect,  because 
for  such  codes  the  induced  code  C  is  as  large  as  any  binary  code 
with  distance  d=^^t-hl.  If  both  inequities  are  satisfied  witii  equality 
we  say  that  the  superimposed  code  C  is  doubly  perfect 


Let  F  denote  the  binary  field  and  let  C  £  F"  be  an  n-length 
block-code  over  F.  For  any  m;0SmST  =  lc|  denote  by  the 
set  of  all  codewords  x  e  F"  which  can  be  formed  as  a  sum 
X  =  X[  +  Xj  +  ...  +  x,of  s  distinct  codewords  Xj  from  C  where 
0  ^  s  ^  m.  If  they  are  all  distinct  -  this  is  the  case  of  interest  to  us  - 
it  is  clear  that  C*^  is  a  code  of  size 

T*„^=ic*j=i(T)- 

i=0 

We  say  that  the  original  code  C  is  a  superimposed  (n4n,d,T)-code  if 
the  induced  code  has  minimum  distance  at  least  d:  d(C  ^  d. 
The  problem  is  to  choose  C  so  as  to  obtain  the  best  possible  trade¬ 
off  between  the  parameters  (n,m,d,T).  A  key  result  is  the  following. 

Theorem  1:  If  for  some  k,  0  ^  k  ^  n,  there  exists  an  [n,k,d]-code 
and  a  [T,T-k,2m+l]-code,  then  there  exists  a  superimposed 
(n,d,m,T)-code. 

Proof:  Let  G  be  a  generator  matrix  of  an  [nJt4]-code  and  let  H  be 
a  parity  check  matrix  of  an  [T,T-k,2m-t-l]-code.  A  superimposed 
(n^i^n.Tj-code  is  given  by  the  rows  of  the  matrix  H'G  (H'  denotes 
transponse  of  H).  Indeed,  any  linear  combination  of  rows  from  the 
matrix  H'G  belongs  to  the  [nJc,d]-code  generated  by  G.  Moreover, 
any  2m  rows  from  H'G  are  linearly  independent  because  the 
coluttuis  of  H  have  exactly  this  property.  Finally,  it  is  obvious  that 
H'G  has  T  rows  and  n  columns. 

□ 

Example:  Let  G  generate  the  Hamming  code  (n.k4]  =  [1S,1 1,3] 
and  let  H  generate  the  Golay  code  [T,T-k,2m+l]  =  [23,12,7]. 
Theorem  1  produces  a  superimposed  code  with  parameters 
(n,m,d,T)  =  (15,3,3,23). 


Doubly  perfect  codes  do  exist  The  above  example  with  (n,m,d,T)  = 
(15,3,3,23)  is  one  example.  Further  examples  are  as  follows: 


n 

m 

d 

T 

11 

3 

1 

23 

15 

3 

3 

23 

23 

6 

7 

13 

2^’^*-l 

2^-s-l 

3 

2^*’-2s-l 

s=l,2,3,... 

2m 

m 

1 

2m+l 

m=2,3,... 

RcfCTen«5 
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High-dimensional  Symmetric  Compacted  Code 

—  Error-correcting  of  high  bit  error  rate  of  10“^  ~  10~^  — 
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Gokiso-cho,  Showa-ku,  Nagoya,  466  Japan 


Concept  of  new  code 

In  this  paper,  a  long  block  code  is  conceived  as  a  code  string 
and  it  is  wound  up  into  a  compact  sized  knot  in  an  n  dimensional 
torus  space  'P'  (Ref.1,2).  The  code  string  is  winding  diagonally 
to  the  each  dimensional  fundamental  cycles  of  the  n  dimensional 
torus.  Therefore,  the  digits  on  the  code  string  are  scattered  about 
the  fundamental  cycles  and  the  time  and  space  distances  between 
digits  on  the  each  cycles  and  among  the  cycles  are  greatly  ex¬ 
panded  and  independences  between  digits  and  among  cycles  are 
assured. 

The  digits  on  the  each  fundamental  cycles  form  a  single  unit 
code.  These  unit  codes  are  mutually  independent  against  the 
transmission  errors  through  the  said  independence.  Therefore, 
we  can  give  a  modeling  for  an  erroneous  route  as  an  error  screen 
with  an  interval  determined  by  the  inverse  of  the  mean  bit  error 
rate  of  the  route.  If  the  size  of  unit  code  of  the  fundamental  cycle 
is  smaller  than  the  interval,  the  code  can  pass  through  without 
hit  by  the  error  screen. 

And  so  the  designing  size  of  the  each  fundamental  cycles  of 
the  code  should  be  the  same  in  order  to  obtain  an  optimized 
robustness  and  the  code  comes  to  have  geometrical  symmetries 
like  a  crystalline  ball. 

Example  of  proposed  code 

For  this  paper,  the  unit  code  which  is  constituting  each  di¬ 
mension  is  a  simple  short  length  parity  check  code  of  the  same 
length  m.  The  code  is  constituted  by  means  of  product  space  of 
cyclic-shifted  versions  of  one-dimensional  parity  check  codes  by 
n-fold  orthogonally,  which  shows  the  structural  symmetries  and 
satisfies  the  n-dimensional  parity  check  functions.  The  burst  er¬ 
ror  correcting  ability  of  the  code  is  (m  —  in  length  with 

transmission  rate  /?  ==  (1  -  of  code  length  of  m".  The  ran¬ 
dom  error  correcting  abilities  are  also  appreciably  increased  with 
the  dimension  n,  especially  for  n  >  4,  exceeding  the  minimum 
distance  limit  of  the  code  of  (2"/2  —  1).  The  proposed  code  is 
defined  as  a  quasi-cyclic  code.  The  uncorrectable  patterns  of  er¬ 
ror  are  uniquely  showed  in  the  space  as  a  symmetrical  solid  to 
the  parity  check  axes,  which  is  the  same  solid  for  both  burst  and 
random  errors. 

Results 

By  computer  simulations,  we  could  show  that  the  code  may  be 
applicable  to  a  worst  quality  of  bit  error  rate  of  order  of  10“'  ~ 
10“^  and  has  a  better  decoding  bit  error  rate  for  the  range  of 
transmission  rate  of  ^  ~  J  than  the  convolution  code. 

As  an  example,  the  code  of  m  =  5.  n  =  5  of  length  m"  =  3, 125 
and  R  =  0.32768  «5  0.33  can  almost  [)erfectly  correct  150  random 
errors,  which  correspond  to  4.8  x  10“*  of  bit  error  rate,  and  200 
errors  with  99%  correction,  and  can  correct  a  burst  error  of  500 
bits  in  length. 

As  a  similar  code,  an  n-D  cyclic  code  has  been  proposed  in  late 
60's  and  recently  again  attracted  interests  from  the  practical  sides 
of  decoding  algorithm  and  the  performances,  however,  the  efforts 
are  confined  to  only  two  dimensional  case  and  not  yet  explored 
for  high  dimensional  code  exceeding  two.  Further,  the  n-D  cyclic 
code  is  consisting  of  different  sizes  of  length  of  dimension  which 
are  primitive  each  other  and  has  no  geometrical  symmetries.  And 
therefore,  the  the  proposed  code  will  be  superior  to  the  n-D  cyclic 


Fig. 2  :  Torus  representation  of  two  and  three-dimensional 
proposed  code 


(A)  3D-RC  (B) 

Fig. 3  :  Undetectable  solid 


Table  1  :  Performances  of  proposed  code 
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code's  operability  for  worse  BER  conditions. 
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Some  Families  of  Asymptotically 
Optimal  Optical  Orthogonal  Codes 


O.  Moreno,  Z.  Zhang  and  P.  V.  Kumar* 


Abstract  Three  related  constructions  for  families  of  optical 
orthogonal  codes  are  presented.  All  are  asymptotically  optimum 
in  the  sense  that  in  each  case,  as  the  length  of  the  sequences 
within  the  family  approaches  infinity,  the  ratio  of  family  size 
to  the  maximum  possible  under  the  Johnson  bound,  approaches 
unity. 


An  (n,a>,  A)-optical  orthogonal  code  (OOC)  (see  [1],  [2]}  C, 
n>l,  l<u;<n,  isa  family  of  {0,  l}-8equence8 

of  length  n  and  Hamming  weight  lj  satisfying  the  following  auto 
and  cross-correlation  conditions: 

^  x(k)x{k  r)  <  \  (1) 

*=o 

for  all  sequences  *(.)  €  C  and  all  integers  r  0{modn)  and 

£x(Jfc)v(Jt©nr)<A  (2) 

ksO 


All  three  constructions  make  use  of  the  following  two  ideas. 
Let  n  be  an  integer  that  can  be  expressed  as  the  product  n  = 
ninj  of  two  relatively  prime  integers  n]  and  n^.  Then,  from  an 
application  of  the  Chinese  remainder  theorem,  it  follows  that  the 
construction  of  sets  of  {0,1}  sequences  with  periodic  correlation 
bounded  above  by  A  is  completely  equivalent  to  the  task  of  con¬ 
structing  a  collection  of  arrays  whose  doubly-periodic  correlation 
is  bounded  above  by  A.  Secondly,  the  sequences  in  the  OOC  are 
required  to  have  constant  weight.  The  sequences  in  each  of  the 
three  families  A,  8  and  C  when  represented  in  matrix  appear  as 
the  graph  of  a  function  mapping  -  Zn,.  This  guaran¬ 
tees  that  they  all  have  constant  weight  (approximately)  tij.  The 
functions  in  A  and  B  are  polynomials,  whereas,  construction  C 
uses  rational  functions. 

Precise  parameters  of  the  three  families  constructed  are  tabu¬ 
lated  below.  Reference  [4]  appeared  after  the  initial  preparation 
of  this  paper.  The  two  papers  share  some  materi2d  in  common 
such  as  the  idea  behind  the  construction  as  well  as  some  features 
of  construction  A. 


for  all  p«rs  of  sequences  i(.),  y(.)  €  C  and  all  integers  t,  where 
denotes  addition  modulo  n. 

For  a  given  set  of  values  of  n,  w  and  A,  let  $(n,u;,  A),  denote 
the  largest  possible  cardinality  of  an  (u,ui,  A)-optical  orthogonal 
code.  Upper  bounds  for  this  function  and  several  optimal  con¬ 
structions  for  A  =  1  and  2  can  be  found  in  [l)-[3].  An  easy  upper 
bound  derived  from  the  Johnson  bound  (see  (1))  states  that 

I  A{n,2w-  2A,u>)  ^  (n  -  l)(n  -  2)...(n  -  A) 
rj  ~  —  l)...(w  —  A) 

(3) 

In  this  paper,  three  constructions  (^,  8  and  C  )  for  families 
of  OOC’s  are  presented.  In  every  case,  the  families  are  asymptot¬ 
ically  optimum  in  the  sense  that,  as  the  length  of  the  sequence 
family  — *  oo,  the  ratio  of  the  size  of  the  OOC  to  that  of  the 
maximum  permissible  as  determined  by  the  bound  in  (3)  above, 
approaches  unity. 


$(n,u;.  A)  <  [- 
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Abstract 

We  investigate  the  single  bit  inseition/deletion  correcting  codes 
as  propo^  by  Levenshtein  and  others.  [1  —  4,6].  These 
srachronization  error  correcting  codes  are  derived  from  number 
theoretic  constructions.  The  weight  spectra  and  Hamming 
distance  properties  of  these  codes  are  found  and  a  relationship 
between  these  properties  is  established.  This  relationship  is 
extended  to  co<^  that  can  correct  multiple  random 
synchronization  errors.  From  this  general  relationship, 
improved  bounds  on  the  cardinality  of  such  multiple 
synchronization  error  correcting  codes  are  found.  From  the 
new  relationship  between  the  weight  and  Hamming  distance  of 
synchronization  error  correcting  codes,  several  new  codes  are 
found. 

Introduction 

In  1965  Levenshtein  [1,2]  found  that  a  certain  code  construction 
technique  developed  by  Varshamov  and  Tenengol’ts  [3]  could 
also  yield  codes  that  are  capable  of  correcting  sin^e 
synchronization  errors.  This  work  was  later  extended  by 
several  workers  |4,  6].  Synchronization  errors  manifest 
themselves  in  the  oit  stream  as  the  deletion  of  a  valid  symbol  or 
the  insertion  of  such  a  symbol.  We  first  investigated  the 
binary  single  error  correcting  codes  as  developed  by  Lwenshtein 
to  determine  some  new  properties.  The  weight  spectra  and  the 
Hamming  distance  profiles  for  several  short  len^h  codes  were 
determined.  The  dc-free  subcodes  of  the  Levenshtein  codes 
were  investigated  as  well  as  runlength  limited  concatenatable 
subsets  that  have  a  minimum  runlength  constraint  of  1,  i.e. 
there  is  at  least  one  zero  between  ones.  The  spectra  of  some  of 
the  dc— fiee  codes  are  also  presented. 

New  bounds  on  the  cardinality 

From  the  investigation  of  the  Levenshtein  codes,  the 
relationship  between  the  weight  of  a  codeword  and  the 
Hamming  distance  between  other  Levenshtein  codewords  of 
specific  weights  was  determined.  From  this  relationship  follows 
several  propositions  that  establishe  a  similar  relationship  for 
codes  that  are  capable  of  correcting  two  synchronization  enors 
when  using  a  number  theoretic  construction  technique  similar  to 
that  of  Levenshtein.  By  using  the  abovementioned 
relationships  we  establish  the  following  new  upper  and  lower 
bounds  on  the  cardinality  of  double  synchronization  error 
correcting  codes: 

Upper  bound: 

(i-j 

|C|  <2+  S  (n/v)  •  A(n-I,6,w-i))  n>w>l  (1) 

tf=5 

where  w  is  the  weight  (i.e.  the  number  of  "ones")  of  the 
codeword  of  le^th  n  and  A(n,d,vi)  is  the  number  of 
codewords  of  wei&t  w  which  differ  from  each  other  in  at  least  d 
positions.  The  Bound  in  (1)  is  due  partly  to  Johnsson  |5]  who 
derived  an  upper  bound  for  constant  weight  codes  with  a  certain 
minimum  Hamming  distance. 

Lower  bound: 

|C|>2"/E  ('-/)  (2) 
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which  is  a  Gilbert  —  Varshamov  type  lower  bound  on  the 
cardinality  of  linear  error  correcting  codes.  These  new  bounds 
are  compared  to  the  known  bounds  in  Figure  1. 


Figvre  1 :  Bomds  on  the  cvrdina!^  of  donble  affnehronizatiom 
error  correcting  codes. 

New  codes 

We  also  present  general  construction  techniques  for  codes  that 
are  capable  of  correcting  two  adjacent  synchronization  errors 
and  two  random  synchronization  errors  respectively.  The 
cardinality  of  these  codes  for  short  word  lengths  is  also  given. 
Bv  combining  certain  dc  free  criteria  to  the  balanced  subcodes 
of  the  Levenshtein  codes,  we  found  a  class  of  "multipurpose" 
codes  which  have  enhanced  dc  suppression,  are  able  to  correct 
either  one  synchronization  error  or  one  additive  error  and 
require  less  bandwidth  than  similar  codes. 

References 

[1]  V.  I.  Leventbtein,  "Binary  code*  capable  of  correcting  deletions, 

insertions  and  reversals,"  (Russian:)  Doklady  Akademii  Nauk 
SSSR  163(4),  pp  845  -  848,  1965:  (Engiisb:)  Soviet 

Pbysics-Doklady  10(8),  pp  707  -710,  1966. 

[2]  V.  I.  Levenshtein,  "On  perfect  codes  capable  of  correcting 
deletions  of  a  cbaracter,"  Fourth  Joint  Swedish  —  Soviet  Interna¬ 
tional  Workshop  on  Information  Theory,  August  27  — 
September  1,  1989,  Gotland,  Sweden,  pp  199—204. 

[3]  R.  R.  Varshamov  and  G.  M.  Tenengdts,  "On  asymmetrical 
error  —  correcting  codes,"  (In  Russiaa),  Avtomatika  i 
Telemehanika,  vol  26,  N2,  pp  288  —  292,  1965 

[4]  H.  D.  L.  Hollmann,  "A  relation  between  Lcvenahtein-4ype 
distances  and  insertion— and— deletioo  correcting  capabilitiea  of 
codes,"  Internal  report.  Philips  Rsseatch  Laboratories, 
Eindhoven,  The  Netherlands,  November  9,  1990 

(5|  S.  M.  Johnaon,  "Improved  asymptotic  bounds  for  error  correc- 
ting  codes,"  IEEE  Transactions  on  Information  Theory, 
vol  IT-17,  pp  198  -  205,  July  1963. 

(6)  P.  A.  Bours,  "Bounds  for  codes  that  arc  capable  of  conccting 
insertions  and  deletions,"  Internal  report,  Technische  Univenitcit 
Eindhoven,  Eindhoven,  The  Netherlands,  July  1991. 


299 


The  Two-way  Channel  as  a  Computer  Game 


Alphons  H.  a.  Bloemen,  Hendrik  B.  Meeuwissen,  and  .T.  Pieter  M.  Sciialkwiik 
Group  on  Information  and  Communication  Theory,  Eindhoven  University  of  Technology 
PO  Box  513,  5600  MB  Eindhoven  E-mail:  phonsCei.ela.tue.nl 


Abstract 

In  his  1961  paper  on  two-way  channels  (TWC’s)  Shannon  de¬ 
rived  single-letter  inner-  and  outer  hounds  to  the  capacity  region. 

The  6rst  part  of  this  paper  is  a  survey  of  earlier  results  on  TWC’s 
in  general  and  the  binary  multiplying  channel  (RMC)  in  particu¬ 
lar.  The  second  part  is  devoted  to  a  new  approach  to  the  problem 
of  determining  the  capacity  region  of  the  BMC.  Based  on  Schalk- 
wijk’s  1982  idea  to  represent  symmetric,  =  R,  coding 

strategies  for  deterministic  two-way  channels  as  progressive  subdi¬ 
visions  of  an  M  X  Af  square,  we  developed  a  computer  game 
as  a  development  environment  for  new  coding  strategies.  Playing 
A.XS  is  simple  and  requires  no  background  in  information  theory. 

1  History 

In  order  to  approximate  the  capacity  region  of  a  TVVC  we  start  off  with 
Shannon’s  [8]  observation  (1961)  that  the  capacity  region  can  be  found 
from  the  per  letter  rate  of  increasingly  long  coding  strategies.  An  initial 
hurdle  was  the  fact  that  coding  strategies,  where  the  code  sequence  at 
each  terminal  not  only  depends  on  the  message  0  being  transmitted  but 
also  on  the  received  sequence  Y  at  that  terminal,  are  hard  to  visualize. 

A  breakthrough  [5]  was  made  in  1982,  when  it  was  di.scovered  that 
coding  strategies  for  deterministic  TWC’s  could  be  considered  as  strate¬ 
gies  for  subdividing  the  unit  square.  For  the  BMC  a  subdivision  using 
three  types  of  resolutions  (to  be  referred  to  as  i-,  m-  and  o-resolutions) 
was  found.  In  the  case  of  equal  rates  on  both  directions  this  constructive 
coding  strategy  achieves  0.61914,  in  excess  of  Shannon’s  inner  bound  of 
0.61695.  Dueck  [2]  just  previously  proved  by  example  that  the  capacity 
region  of  a  TWC  is  in  general  larger  than  its  inner  bound. 

A  further  [6]  basic  step  was  taken  a  year  later  in  1983.  The  m- 
resolution  in  the  strategy  (5)  mentioned  above  was  not  efficient.  This  m- 
resoiution  takes  place  in  an  L-shape  subregion.  By  collecting  a  number 
of  those  L-shapes  the  total  resolution  information  that  has  to  be  sent 
from  terminal  I  to  terminal  2  and  vice  versa  can  be  accumulated  at 
each  terminal.  In  the  limit,  the  total  resolution  information  can  be 
transfered  at  the  very  rate  of  the  resulting  strategy  using  a  technique 
[6]  called  bootstrapping,  thus  boosting  the  0.61914  rate  of  the  original 
stratogj-  up  to  0.630.56.  The  resulting  strategy  effectively  resembles 
our  original  strategy  where  the  m- resolution  ha.s  been  eliminated.  As 
this  equivalent  strategy  is  very  simple  and  elegant,  0.63056  was  initially 
thought  to  be  the  equal  rate  capacity  of  the  BMC. 

However,  repeated  trials  to  find  a  converse  failed  and  suspicion  re¬ 
garding  optimalit>  aro.se.  Accurate  calculation  of  the  rate  of  the  boot¬ 
strapped  strategy  yields  0.6305552557.  Finally,  an  improvement  to 
0.630-5.5.52986  was  found,  by  having  tiro  initial  i-resoiutions  and  preserv¬ 
ing  the  efficiency  of  the  postponed  o-resoliition  by  a  transparency  con¬ 
dition.  Another  [7]  slight  improvement  yields  the  tightest  lower  bound 
0.63055.52995  to  the  equal  rate  capacity  as  of  to  date. 

Shannon’s  [8]  upper  botind  of  0.69121  has  been  tightened  by  Zhen 
Zhang,  et  al.  [11]  to  0.64891  for  general  TWC’s.  The  tightest  upper 
bound  as  of  now  for  T-channcls  (TC’s),  i.  e.  channels  with  two  inputs 
and  a  single  common  output,  found  by  Ilekstra  and  Willems  [3],  yields 
0.61628  for  the  BMC.  It  is  our  belief  that  the  final  result,  at  least  for 
tlie  BMC,  is  closer  to  the  best  inner  bound  of  0.630.55.52995.  In  order 
to  find  better  upper  bounds  it  will  be  necessary  to  consider  the  coding 
strategies  in  greater  detail. 

In  classical  one-way  communication  we  distinguish  [1]  between  the 
information  theoretic  and  the  operational  capacity.  Shannon’s  channel 
coding  theorem  shows  these  two  capacities  to  be  equal.  The  achievable 
rates  of  our  oripnal  strategy  [5]  and  of  the  bootstrapped  strategy  [C] 
arc  information  theoretic  rates.  It  was  reasoned  that  these  rates  were 
also  operational  as  they  related  to  the  size  of  resolution  products  in  the 
unit  square.  Rigorous  proofs  of  the  operationality  of  the  rates  in  [5]  and 
(6)  were  given  by  Toihuizen  [9]  and  van  Overveld  [10],  respectively. 

2  The  ^A'5-program 

There  are  several  reasons  for  trying  to  find  the  capacity  region  of  the 


BMC.  First,  the  BMC  was  used  as  the  simplest  non  trivial  example  of  a 
TWC  by  Shannon  in  hi'  original  1961  paper.  This  [8]  paper  marks  the 
beginning  of  network  information  theory,  see  also  Cover  [1,  chapter  14]. 
Second,  once  the  BMC  is  solved,  it  seems  likely  that  similar  methods  can 
be  applied  to  solve  all  deterministic  TWC’s.  Finally,  it  does  not  seem 
possible  to  make  any  progress  on  general  (non)  deterministic  TWC's  as 
long  as  the  easier  deterministic  TWC’s  are  not  completely  understood. 
Asan  example,  there  are  322  distinct  [4]  ternary  deterministic  TWC’s 
of  which  46  are  T-channels. 

Note  that  the  best  [6]  lower  bound  of  0.6.305552995  to  the  equal 
rate  capacity  is  the  result  of  a  continuing  line  of  research  building  upon 
our  very  first  [5]  strategy.  Continuation  of  this  research  line  does  not 
look  promis’mg:  it  will  result  in  more  complicated  strategies  with  more 
parameters  and  smaller  improvements,  and  maybe  there  is  an  entirely 
different  class  of  strategies  that  perform  better.  Computer  search  is 
unfeasable,  since  the  number  of  possible  strategies  dwarfs  Avogadro’s 
number  of  6.02  -10^^  for  relatively  small  message  sets.  However,  people 
seem  to  have  a  feeling  as  to  the  proper  shape  of  resolution  products.  To 
help  people  contrucling  strategies  for  binary  and  ternary  TWCs  on  Af  x 
Af  squares,  F.  Hantz  and  A.  Bloemen  developed  the  computer  puzzle 
game  A.YS.  The  ’game’  is  played  by  editing  resolutions  already  stored 
in  the  computer  as  part  of  a  strategy  tree.  Any  improvements  replace 
the  resolutions  currently  stored.  There  is  no  information  theoretical 
background  needed  to  play  the  game.  By  putting  the  game  in  public 
domain  we  hope  to  get  some  good  coding  strategies. 

The  results  obtained  so  far  look  very  promising.  Early  results  were 
found  by  J.  van  der  Leur,  who  developed  a  coding  strategy  resolving 
all  message  pairs  of  a  17  x  17  square  with  a  rate  of  0.61079.  A  first 
tuning  step  is  the  save-up  method:  sqnare  and  rectangle-like  resolution 
products  are  not  resolved,  btit  they  are  put  together  into  a  new  Af  x 
Af  square.  This  technique  already  provides  a  coding  strategy  on  the 
6x6  square  with  rate  0.61795,  and  on  the  11  x  11  square  with  rate 
0.61984.  These  rates  exceed  Shannon’s  [8]  inner  bound  rate  of  0.61695 
and  Schalkwijk’s  [5]  oripnal  rate  0.61914,  respectively.  The  second  step 
is  bootstrapping,  similar  to  [6],  to  weed  out  inefficient  resolutions.  For 
a  13  X  13  square,  a  discrete  bootstrap  strategy  with  rate  0.630-50  has 
been  constructed.  The  third  step,  transforming  a  discrete  bootstrap 
strategy  into  a  continuous  bootstrap  strategy  will  almost  surely  yield  a 
new  lower  bound. 
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Quasi-Cyclic  (QC)  codes  are  a  generaliza¬ 
tion  of  cyclic  codes  whereby  a  cyclic  shift 
of  a  codeword  by  p  positions  results  in  an¬ 
other  codeword.  The  class  of  QC  codes  is 
of  interest  because  it  contains  many  of  the 
best  known  binary  linear  codes.  The  results 
presented  here  reinforce  the  statement  that 
‘Quasi-Cyclic  codes  are  good’. 

The  blocklength,  n,  of  a  QC  code  is  a  mul¬ 
tiple  of  p,  n  =  mp.  Many  of  the  results  on 
QC  codes  presented  in  the  literature  concern 
those  for  which  a  generator  matrix  can  be 
constructed  from  m  x  m  circulant  matrices, 
(with  a  suitable  permutation  of  coordinates). 
In  this  case  the  generator  matrix  can  be  rep¬ 
resented  as 

G  =  (co(a:),Ci(x),C2(x),C3(i),...,Cp_i(x)), 

where  the  coefficients  of  the  polynomial  Ci{x) 
are  defined  by  the  circulant  matrix  Ci.  G  is 
a  (pm^m)  code,  and  the  dual  code  H  is  a, 
{pm,{p  —  l)m)  code.  To  date  most  of  the 
results  on  QC  codes  are  concerned  with  these 
rate  1/p  and  (p  —  l)/p  codes.  In  this  paper 
a  generalization  of  the  rate  1/p  codes  to  rate 
(m  —  r)fpm  codes  is  presented  based  on  the 
theory  of  1 -generator  QC  codes,  which  are 
a  sub-class  of  QC  codes.  The  order  of  a  1- 

'This  research  was  supported  in  part  by  the  Nat¬ 
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generator  QC  code,  V,  is  defined  as 

x"*  —  1 

-  l,Co(x),Ci(x),...,Cp_i(x))’ 

and  fc,  the  dimension  of  V,  is  equal  to  the  de¬ 
gree  of  h{x).  If  (x’"  — l,Cj(x))  =  1,  the  dimen¬ 
sion  of  V  is  m,  and  G  is  a  generator  matrix 
for  V.  If,  deg(/t(x))  =  k  <  m,  a  generator 
matrix  for  V  can  be  constructed  by  deleting 
r  =  m  —  k  rows  of  G.  In  this  paper,  codes  are 
constructed  with  deg(/i(x))  <  k  —  1. 

Linear  programming  is  efficient  for  finding 
optimal  codes  if  k  is  small.  However,  an  ex¬ 
haustive  search  quickly  becomes  intractable 
as  k  increases.  Therefore  non-exhaustive 
techniques  must  be  employed  to  search  for 
good  codes.  A  greedy  exchange  algorithm  has 
previously  been  used  with  good  results  be¬ 
cause  it  is  computationally  simple  and  there¬ 
fore  able  to  cover  a  large  number  of  codes 
quickly,  and  so  was  also  used  in  this  case. 
Although  the  resulting  codes  are  not  guaran¬ 
teed  to  be  optimal,  they  can  be  compared 
with  known  bounds  to  determine  if  a  bet¬ 
ter  code  can  exist.  The  results  of  this  search 
are  ten  codes  which  improve  the  known  lower 
bounds  on  the  minimum  distance  of  binary 
linear  codes  as  tabulated  by  VerhoefF. 
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It  is  shown  in  this  paper  that  there  exists  a  class  of  codes 
generated  by  the  known  self-dual  binary  extended  quadratic  residue 
(EQR)  codes.  Eiach  code  in  this  new  class  has  an  even  better  error- 
control  rate  than  its  “parent”  binary  EQR  code.  Asymptotically, 
both  the  information  rate  and  the  error  control  rate  of  these  new 
codes  are  shown  to  be  bounded  away  from  zero  so  that  they  repre¬ 
sent  “good"  codes. 

The  Self-dual  Subclass  of  the  Binary  EQR  Codes 

Let  Q  denote  the  set  of  quadratic  residues  modulo  a  prime 
integer  n,  i.e.  let  Q  =  {i*  {mod  n)|<  €  GF(n),  i  ^  0}.  FWthermore, 
define  q{x)  =  a  is  a  primitive  n-th  root  of 

unity  in  an  extension  held  of  GF{2).  The  cyclic  code  of  length  n 
over  GF{2)  with  the  generator  polynomial  g{x)  is  called  a  binary 
QR  code  and  denoted  by  Q. 

Let  C  denote  the  extended  code  of  the  binary  QR  code  Q.  Since  Q 
has  an  odd  minimum  distance  d,  C  has  the  minimum  distance  d  -b  1 . 
Evidently,  such  an  extended  code  C  is  of  the  form,  (n-bl,  2^^,d-f-l). 
Thus,  the  information  rate  of  C  is 

Next,  the  self-dual  class  of  the  extended  binary  QR  codes  is  ob¬ 
tained. 

Definition  1  :  Let  C-*-  denote  the  dual  code  of  C.  If  C-*-  =  C, 
the  code  C  is  called  a  self-dual  code.  FVirthermore  if  all  weights  are 
divisible  by  4,  the  code  C  is  called  a  doubly  even  self-dual  code. 

Lemma  1  :  Let  C  be  a  binary  EQR  code  (n  -b  1,  ^^,d  -b  1). 
Then  if  n  s  8m  —  1,  C  is  a  doubly  even  self-dual  code. 
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Table  1:  A  list  of  codes  of  this  new  class 


R{N)  and  E{N)  denote  the  information  rate  and  the  error-control 
rate  of  the  new  code,  respectively,  i.e.  let  R{N)  =  ^  and  E{N)  = 
^ .  Next,  define  the  limit  superior  and  limit  inferior  of  E{N)  to  be, 
respectively, 

Su  =  limsup  E{N),  Si  =  liminf  E{N).  (1) 

M—co,N>M  «o,W>« 

In  a  similar  manner  for  R{N),  define 

Ru  =  limsup  R{N),  Ri  *  liminf  R(Af).  (2) 

M->oo,N>M  M~>oo,S>y 

Finally,  for  each  real  number  $|  <  8  <  £«  let 

Rfj)  =  sup{  Uminf  R(N,djv)}  (3) 


A  New  Class  of  Binary  Linestr  Codes 

A  construction  theorem  for  the  new  codes  is  given  in  this  sec¬ 
tion. 

Theorem  1:  For  a  given  binary  self-dual  EQR  code  (n  -b 
1 ,  ,  d  +  1 ) ,  there  exists  a  binary  linear  code  {N,K,D)  such  that 

N  =  n-d,  K  =  2fi-d,  andD>(l-bl. 

An  infinite  class  of  binary  linear  codes  is  constructed  by  means 
of  Theorem  1.  The  construction  technique  of  these  codes  consists 
primarily  of  two  steps.  First  certain  d+1  linearly  dependent  columns 
are  found  in  the  generator  matrix  of  a  self-dual  binary  EQR  code 
(n  -b  1, -b  1).  Then  the  columns  and  rows,  associated  with 
such  the  corresponding  d  marked  columns  of  the  generator  matrix, 
are  punctured  and  deleted.  Finally,  one  more  column  is  removed  to 
leave  a  matrix  which  is  the  generator  matrix  of  the  new  code.  The 
parameters  of  several  new  codes  whidi  are  generated  by  Theorem  1 
are  listed  in  Table  1.  In  such  a  table,  the  given  distance  D  is  found 
by  a  computer  search.  The  result  shows  that  the  error  control  rate 
of  each  new  code  is  better  than  that  of  its  corresponding  self-dual 
extended  binary  QR  code.  The  information  rate  of  each  new  code 
satisfies  the  inequalities  4  ^  ^ 


denote  the  outer  supremum  taken  over  all  sequences  {djv}  for  which 
dn/N  -  5  and  1l{N,df,)  =  N-'losjAf(yV,rfjv)  where  M{N,ds) 
is  the  largest  possible  number  of  codewords  in  a  code  of  length  N 
with  a  minimum  distwce  of  at  least  dff. 

The  main  theorem  on  the  asymptotic  bounds  for  the  new  punc¬ 
tured  subcode  is  presented  as  follows. 

Theorem  3  :  Let  (N,K{N),D{N))  denote  the 
code  developed  in  Theorem  1.  Then 

new  punctured 

1) 

liminf  E(N)  >  0.1236; 

M~-as>,S>U 

(<) 

2) 

R,  <  0.4382,  R,  >  0.4 

(5) 

where  R«  and  R|  are  defined  in  (2);  and 

3) 

R(N)  >  i  -  1e(N). 

(6) 

Corollary  3  (Asymptotic  bounds).  For  the  new  punctured  code 
{N,K{N),D{N))  of  Theorem  1  one  has  the  following  inequalities  : 


Asjfmptotic  Bounds 

The  asymptotic  bounds  on  the  information  rate  and  the  error- 
control  rate  of  the  new  codes  are  found  in  this  section.  To  describe 
these  bounds  compactly,  the  following  notation  is  used.  First,  let 
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0.4  <  R{N)  <  0.4382,  (7) 

0.1236  <E(N)  (8) 

and 

i  -  1b(N)  <  R(N)  <  R(6)  (9) 

for  N  sufficiently  large  where  R(f)  is  defined  in  (3). 
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The  covering  polynomial  method  of  decoding  cyclic  codes  is 
described  in  [1]  and  is  essentially  a  modification  of  error  trapping. 
While  this  method  is  a  simple  and  effective  way  to  decode  many 
cyclic  codes,  determining  the  optimum  set  of  covering  polynomials 
for  general  code  length  n,  code  dimension  k,  and  error  weight  r  is 
not  an  easy  problem  (it  is  Research  Problem  16.12  in  MacWilliams 
and  Sloane  [2]). 


When  the  rate  of  a  code  satisfies  R  <  2/r,  all  error  patterns 
can  be  trapped  by  monomials,  and  Wei  [4]  presented  a  smallest 
covering  set  for  this  class  of  codes  when  r  =  2  or  3.  We  ex¬ 
tend  Wei’s  result  to  higher  t,  and  propose  an  algorithm  to  find 
optimum  covering  sets. 


Proposition  1  :  For  (n,  k,  r)  binary  cyclic  codes  with  rate  R  < 
2/r,  the  number  of  optimal  covering  monomials  for  eoen  r  is  given 
by 


c 


p 


-1-  1 ,  where  p 


k 


Further,  {  0,  f?! ,  f?! 

^n-*-i+('a1+(e_i)p  j  is  a  covering  set. 


Note  that  our  choice  of  optimal  monomials  above  are  all  lo¬ 
cated  in  one  half  of  the  information  positions.  The  number  of  cov¬ 
ering  polynomials  is  very  much  dependent  on  the  interval  patterns 
which  represent  how  error  bits  are  spaced  in  an  error  vector.  Two 
error  patterns  are  said  to  have  the  same  interval  pattern  if  they 
are  cyclic  shifts  of  each  other.  Let  v  =  (vi,  wj,  •  ■  • ,  Ut)  denote  the 
interval  pattern,  and  Vm  =  maXiUj.  Then  <  «„,  <  n  — (t  — 1). 

Proposition  2  :  For  (n,  it,  r)  binary  cyclic  codes  with  rate  R  < 
2/r,  the  following  procedure  gives  covering  monomials  that  are 
sufficient  to  trap  all  error  patterns  of  weight  r,  and  we  conjecture 
that  the  set  is  minimal. 


1.  Let  z,  =  f?]  ,M,(i)  =  =  2. 

2.  Assume  2j_,  <  «„  <  Zj  —  1,  where  2,(>  2j_i)  is  un¬ 
known,  and  covering  monomials  •  •  • ,  Mj(x)  are  used. 

Mj{x)  =  when  j  is  odd,  Mj(x)  =  z"”''  when  j  is 

even.  Beginning  from  the  condition  Um  ^  ~  8®* 

sets  of  upper  bounds  on  lengths  of  ,  •  •  ■ ,  Ur,  Ui ,  •  •  • ,  u^-i 

that  make  any  consecutive  interval  fractions  (ui,»,+i)  are 
not  covered  by  any  monomials  used.  Call  the  sums  of  up¬ 
per  bounds  El,  •••,£/,  where  I  is  the  number  of  possible 
combinations  of  interval  fractions.  Find  the  largest  z,  that 
satisfies  Ei  <  n,  ail  t. 

3.  If  Zj  >  k,  then  {0,  A/i(x),  ■  •  • ,  A/j_i(z))  is  the  covering  set, 
size  of  the  set  c  =  y,  stop. 

Otherwise,  let  j  =  j  1 ,  go  to  step  2. 
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For  cyclic  codes  with  rate  R  <  "ijr,  all  error  patterns  can  be 
trapped  by  binomials  or  monomials. 

Proposition  S  :  The  number  of  binomials  c  required  to  trap 
every  error  pattern  of  weight  r  =  3  for  an  (n,  k)  code  with  rate 
R  <  1  is  given  by 


<  C  <  ^ 


min(n  —  3*,  2Ar  —  n  -f  1  + 


n  —  Jb 


where  /(n,  ik,  t)  is  the  number  of  interval  patterns  of  length  n  and 
error  weight  t,  whose  largest  fraction  is  not  greater  than  k. 


The  above  equation  gives  reasonably  tight  bounds  on  the  number 
of  binomials.  For  example,  n  =  21,1k  =  15  pves  9  <  c  <  10, 
n  =  21,  Ik  =  18  gives  c  =  21,  and  n  =  27,  i  =  24  gives  c  =  36. 


Among  the  attractive  features  of  the  algorithm  is  that  it  can 
be  used  to  decode  past  the  guaranteed  error  correcting  power  of 
the  code,  up  to  complete  hard-decision  decoding.  The  method 
can  also  be  extended  easily  to  soft-decision  decoding,  where  we 
find  the  closest  soft  decision  codeword,  subject  to  a  maximum 
number  of  hard  errors.  In  [3]  we  consider  the  decoding  of  binary 
cyclic  codes  of  length  31  or  less,  including  the  decoding  of  error 
patterns  of  weight  t-l- 1  or  higher.  ’We  classify  the  error  patterns  to 
be  trapped  as  (1)  all  error  patterns  of  weight  r  (important  in  soft 
decision  decoding),  (2)  all  coset  leaders  of  weight  t,  and  (3)  all 
unique  coset  leaders  of  weight  r.  Using  a  combination  of  analysis, 
exhaustive  search,  computational  shortcuts,  and  a  greedy  algo¬ 
rithm,  we  determine  the  number  of  covering  polynomials  needed 
for  codes  of  our  consideration  and  summarize  these  results  in  ta¬ 
bles  which  we  do  not  include  in  this  summary.  Simulation  of  the 
performance  of  soft-decision  decoding  using  covering  polynomi¬ 
als  wM  also  performed  to  show  that  approximately  1.5  ~  2.0  dB 
gain  is  achieved  at  a  bit  error  rate  of  10”^  for  soft-decision  using 
covering  polynomial  that  trap  error  pattern  of  weight  t  -|- 1  or  less. 
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Abstract 

The  current  algebraic  geometric  (AG)  codes  are  based  on  the 
theory  of  algebraic  geometric  curves.  In  this  paper,  we  present  a 
novel  approach  for  construction  of  AG  codes  without  any  back¬ 
ground  in  algebraic  geometry.  Given  an  affine  plane  irreducible 
curve  and  its  all  rational  points,  based  on  the  equation  of  this  curve, 
we  can  6nd  a  sequence  of  monomial  polynomials  iV-  Using  the 
first  r  polynomials  as  a  basis  of  dual  code  of  a  linear  code  called  AG 
code,  the  designed  minimum  distance  d  of  this  AG  code  can  be 
easily  determined.  For  these  codes  a  fast  decoding  procedure  with 
complexity  fXn'"'),  which  can  correct  errors  up  to  L(d-l)/2j.  is 
also  shown.  By  this  approach  it  is  neither  necessary  to  know  the 
genus  of  curve  nor  find  a  basis  of  differential  form.  This  approach 
can  be  easily  understood  by  most  engineers.  Some  examples  are  also 
shown,  which  indicate  that  the  codes  constructed  by  this  approach 
ate  better  than  the  current  AG  codes  from  same  curves. 


Summary 

First  we  introduce  a  new  method  to  determine  the  minimum 

distance  bound  for  linear  codes.  Let «  4  ^  h,,  h: . h„  ...  ^  be  a 

sequence  in  fj,  where  h,  -  (h,,.  A,: . hr,)  and  let  S(r)  ^  the 

line»  space  over  F,  spanned  by  the  fii^t  r  vectors  of  H.  Let  H  4  ^ 

h|.  hj . h,  y  and  S(r.i<)  be  the  linear  space  over  F,  spanned  by 

the  first  r  vectors  of  H  and  all  vectors  of  H.  When  a  -  0.  that  means 
H  -  0  and  _S(r,«)  -  S(/).  In  this  paper,  we  are  only  interested  in 
such  H  and  H'. 


for  i<j,  i*jSr,  hj4S(i),  hj,^# 5(r -  1,«),  and  h,,je5(r,H).  (1) 

where  h,.j  4  ( A,,  hj, .  A„  A^, . A^,  A^,  ).  Let  H,  -  (  h hj . 

h,  j’’  andH-lbi.hj,  •  ,  h.  Then  H*,  4 


H 

H, 


can  be 


a  parity  check  matrix  of  a  linear  code  C,  over  F,.  When  a  -  0.  H*, 
is  reduced  to  H,.. 


In  order  to  construct  AG  codes,  we  prefer  to  construct  directly 
some  simple  J  H,  H  )  for  a  given  curve.  For  convenience,  we  res- 
tria  H  and  H  to  some  special  vectors,  that  is,  h,  -  (  p,(X|0i,), 

PritiiJti) . ),  where  (.t,,y,)  ate  rational  points  and 

p,U,y)  is  a  monomial  polynomial  x‘'y*’.  For  simplicity,  denote  H  - 
{  x*'y*',  x“’y*’,  ....  i“'y*',  ...  't.  In  the  same  way,  H  ~ 

x'’y‘'* . x‘'‘y‘'-  ^  In  order  to  construct  H  and  H  such  that  (1)  is 

satisfied  let  us  define  order  of  polynomial /(x,y).  For  each  polyno¬ 
mial /(x,y),  it  is  associated  with  an  integer  o/,  which  satisfies  the  fol¬ 
lowing  conditions: 


"/♦I  =  »/•  if  0/  >  o,  .  and  o,  ,  =  (2) 

For  convenience,  let/be  o/.  We  have  x‘y*  i  a  s  *  b  y  , 

Let  /  be  the  set  of  the  orders  of  polynomials  in  H.  that  is,  /  4  { 
x*'y*'  I  i  -  1,2,...  If  an  integer  pel  and  0  $  p  S  the  order  of  last 
polynomial  in  H,  p  is  called  a  gap  of  /.  Let  the  number  of  all  gaps  of  / 
be  g*.  g*  is  called  the  genus  of  «  (or  /).  Let  g'  -  g*  +  «.  Later  we 
will  see  that  the  action  of  g'  is  as  the  same  as  g  in  current  AC  codes. 


If  {  HM  ^  satisfies  (1),  then  we  have: 

Theorem  1:  If  r  >  g*,  the  minimum  distance  of  code  C,  defined  by 


H 

H, 


,  is  at  least  r-g*+l.  The  value  of  r-g*+l  is  called  designed 


minimum  distance. 


Thus,  construction  of  good  AG  codes  is  now  reduced  to  finding 
H  and  ii  for  a  given  affine  plane  curve,  such  that  (1*)  (1)  is  satisfied 
and  (2*)  g*  -f  u  is  as  small  as  possible.  In  the  following,  for  two 
classes  of  affine  plane  curves  and  for  those  curves  which  can  be 
transformed  into  these  classes  of  curves,  we  give  solutions  of  H  and 
H,  which  satisfy  (1*)  and  (2*). 

Type  1  of  AIBne  Plane  Curves:  f(x,y)=x'  +y*  +g(i,y)  =  0, 

where  gcd(a.A)  -  1  and  a  >  b  >  degg(x.y). 

Type  n  of  AIBne  Plane  Curves:  f(x,y)=x‘y'  +y^*‘  +g(x,y)=0. 
where  gcd(a.A)  -  1  and  a+c.b+c>  deg g(x.y). 

Example  :  Let /(x,y)  -  x*y^  +  y’  -v  -  0  over  GF(2’).  We  have  x 
-  7,  y  “  5.  By  this  new  approach,  we  obtain: 

H  -1  1.  y,  X,  y^,  xy,  x*,  y',  xy^,  x^y,  y*,  x’,  xy’,  x^y^,  y’,  x^y,  xy*, 
x^,x^y^,y‘,x^y*,xy’,x*y,x^y*,y’, ...  K 
/  -  {  0, 5,  7,  10, 12, 14, 15, 17. 19, 20. 21, 22, 24. 25, 26. 27. 28. 29. 
30, 31. 32,  33, 34.  35. ...  K  and  g*  -  (a  - 1 )( A  - 1  )/2  -  12. 

x’.x^.x’.x’.x’y.x^y.x’y.x'y  Kanda-8. 

Thus,  we  have  g'  -  12  +  8  -  20.  From  this  example,  when  r  >  12, 
i.e.  r  *  u  >  g',  d  i  r  ■  12+1  »  r  +  a  -  g'  +  1.  where  r  +  a  is  the 
number  of  check  bits  in  C,.  But  from  (1,  Example  6],  the  genus  is 
26. 

Remark  :  There  are  many  affine  plane  curves,  which  do  not  belong 
to  these  two  types,  can  however  be  transformed  into  any  of  these 
types.  For  example,  an  affine  Hermitian  curve  is  h  '*'  +  v'"^'  +  1  -  0 
over  GF(r^).  It  can  be  transformed  into  x'’''  -  y'  -  y  -  0  by 

X  =  H-  —  and  y  =  - - i.  where  t'  +  i  -  j'*'  -  -1. 

v-x  v-s 

A  fast  decoding  procedure  for  these  AG  codes  can  be  easily 
realized  by  the  fast  decoding  in  [2-4].  The  decoding  procedure  can 
correct  any  L(rf-l)/2j  or  fewer  errors  with  complexity  0(n’'’). 
where  d  is  designed  minimum  distance  determined  by  the  above 
theorems. 
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The  Delsarte-Goethals  codes  !DG(m,d)  (m=2t+2  >  4, 25  d  S  t+1), 
introduced  by  P.  Delsarte  and  J.-M.  Goethals  in  C3],  are 
generalizations  of  the  Kerdock  codes.  Nonlinear  and  distance 
invariant,  they  are  the  best  codes  known  for  their  parameters,  and 
possess  formal  duals  (cf  [4],  [5]). 

For  d  =  t+1,  D  (i(m,d)  is  the  Kerdock  code  lC(m).  The 
automorphism  groups  of  the  Kerdock  codes  are  known  (cf  [1],  [2], 
[6]).  We  study  the  automorphism  groups  of  those  Delsarte-Goethals 

codes  which  are  not  Kerdock  codes  ;  the  D(i(m,d)  codes,  with 
m=2t+2  >  6,  2S  d  <  t. 

We  first  recount  some  definitions  and  properties, 
m’  is  the  integer  m  -  1;  G,  G'  and  F  denote  the  Galois  fields  of 
orders  2™,  2®  and  2  (respectively),  and  tr  the  trace  function  from  G' 
to  F.  G'*  is  the  set  G'  \  {0}. 

The  Reed-^‘^uller  code  of  order  1,  R(l,  m),  is  the  set  of  all  the  affine 
forms  on  'jie  F-space  G.  The  Reed-Muller  code  of  order  2,  R(2,  m), 
is  the  set  of  all  the  boolean  functions  f  on  G  (ie  the  functions  from  G 

to  F)  sucl)  that  the  function  tpf  defined  by ; 

V(x,  y)  e  G^  (pf  (x,  y)  =  f(0)  +  f(x)  +  f(y)  +  f(x  +  y) 

(where  +  denotes  the  addition  in  F)  is  bilinear,  tpfis  called  the 
symplectic  form  associated  with  f. 

tPf  is  the  zero  symplectic  form  if  and  only  if  f  belongs  to  R(l,  m).  A 
coset  of  R(l,  m)  in  R(2,  m)  is  the  set  of  all  the  boolean  functions 
which  admit  the  same  associated  symplectic  form. 

The  weight  w(f)  of  a  boolean  function  on  G  is  the  size  of  its 
support:  w(0  =  I  {x  €  G  /  f(x)  =  1 }  I . 

A  function  on  G  is  called  balanced  if  its  weight  is  equal  to  2®'^  If  f 
belongs  to  R(2,  m),  then  it  is  balanced  if  and  only  if  its  restriction  to 
the  kernel  { xe  G  /  Vye  G,  tpj  (x,  y)  =  0}  of  its  associated  symplectic 
form  is  not  constant  (cf  [7]). 

A  function  in  R(2,  m)  is  bent  if  and  only  if  it  satisfies  one  of  the 
following  equivalent  properties : 

-  its  weight  is  2®"'  ±  2®^ 

-  its  associated  symplectic  form  <p  is  non-degenerate 

-  the  sum  X  ('  *  is  equal  to  2®. 

x.yeG 

If  f  is  bent  anf  g  belongs  to  R(l,  m),  then  f+g  is  bent. 


t-d+1 

are  the  functions  on  G’  :  x  tr(  X  Vj  +  U*)  where 

i=l 

V  =  (vi . vi.<l+l)  ranges  over  G'  and  1  over  R(!,m'). 

If  C  is  a  set  of  boolean  functions  on  a  set  G,  an  aut.^morphism  of  C  is 
any  permutation  ^  on  G  such  that,  for  any  f  in  C,  f  o  ip  is  an  element  of 
C.  More  generally,  we  will  call  homomorphism  of  C  any  mapping 
from  G  to  G  satisfying  the  same  condition. 

We  prove  that  the  automorphism  group  of  D6(m,d)  is  the  set  of  all 
the  permutations  on  G  of  the  type  : 

(x,  E)-y  (a  x^  +  b  ,  E  +  8),  (a,  be  G',  a 0,8 e  F,  k  =  0, ...,  m-2). 
So,  it  is  the  same  as  that  of  the  Kerdock  code  of  same  length. 
Previously,  we  need  to  characterize  the  linear  homomorphisms  of 

the  codeC(m',  d)  (ie  those  linear  mappings  from  G'  to  G'  which  are 
homomorphisms  of  the  code  C(m’,  d)  ). 

THEOREM  1 

The  linear  homomorphisms  of  C  (m'.  d)  ,  m' =  2t+l  25,  2Sd£t, 

are  the  permutations  on  G' :  x  where  a  ranges  over  G'* , 

and  k  =  0,  ....  m'  -  1 

.  the  functions  x  —*b  triax)  where  a  and  b  range  over  G'. 
THEOREM  2 

The  automorphisms  of  Da(m,d)  (m=2t+2  26,2^d^t)are  all  the 
permutations  on  G : 

(x,  E)  (or  2  *  +  b,  E  +  S) 

G'xF  -+  G'x  F 

where  a  ranges  over  G'  *.  b  over  G',  8  over  F,  and  k  =  0, ...,  m  -  2. 
To  achieve  the  proof  of  this  theorem,  we  prove  two  lemmas  : 
LEMMAl 

For  any  element  (u,  u’,  u")  of  G'  ^and  any  element  v  of  (25 

d  5 1),  the  functions  (x,  e)  -+  0(u  x,  e)  +  d(u'x,  e)  +  d(u"x,  e)  and 

(x,  e)  ->  pv(x,  e)  belong  to  the  same  coset  of  R(  1,  m)  if  and  only  if  : 
/)v=0 

2) ii  +  u'+«"=0 

3)  one  of  the  elements  u,  u',  u”  is  equal  to  0. 

UEMMA2 

Let  V  be  any  element  ofG'  (2  id  it)  such  that,  for  any  non-zero 
element  w  ofG',  the  function  ;  (x,  e)  -*  d(wx,  e)  +  pv  (x,  e)  is  bent 
Then  v  is  equal  to  0 . 


G  is  identified  with  G'  x  F  (as  a  linear  space).  Let  6  (x,  e)  be  the 

( 

boolean  function  on  G  defined  by  :  0  (x,  E)  =  tr(X  +  e  tr(x) 

i=l 

and,  for  any  element  v  =  (vj . V|.d+i)  of  G'  where  2  5  d  5  t. 

let  p  V  (x.  e)  be  the  function  defined  by 

t-d+1 

Pv(x,£)  =  tr(  X  Vi  x2‘+l). 
i-I 

The  function  0  is  bent  (cf  [7]  p.460,  th  18).  Its  associated  symplectic 
form  is  the  following  function  on  G^  : 

((x.  e),(y,  tl))  -+  tr(x)  tr(y)  +  tr(xy)  +  E  tr(y)  +  t)  tr(x). 

If  u  and  u'  are  two  distinct  elements  of  G',  then  the  function 
0(ux,  E)  +  0(u’x,  e)  is  bent, 
p  V  is  not  bent  since  p  v  (x,  e)  is  independant  from  E. 

The  Delsarte-Goethals  code  D(l(m,d)  is  the  non-linear  subcode  of 
R(2,  m)  whose  elements  are  all  the  functions : 

(x,  E)  -»  0  (ux,  E)  +  p  V  (x,  E)  +  l(x,  E) 
where  u  ranges  over  G',  v  over  G'  ••d+',  and  1  over  R(l,  m). 

We  denote  by  C  (m',  d)  the  linear  code  of  length  2®'  whose  elements 


CORfXXARY 

The  automorphisms  of  the  shortened  Delsarte-Goethals  code 
lXi(m,d)*  of  length  (2  -  1)  are  all  the  permutations  on 

G'  X  F\((0,  0})  :  (x,  e)  —e  (ax  2  *  e),  where  a  ranges  over  G'*  and 
k  =  0,  m'  -  1. 
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Abstract 

We  consider  two  families  of  exceptionally  good  double-error 
correcting  codes:  the  Zetterberg  binary  codes  and  the  Dumer- 
Zinoviev  quaternary  codes.  The  Zetterberg  codes  are  the  best 
known  family  of  double-error  correcting  binary  linear  codes.  They 
are  longer  than  the  Bose-Chaudhuri-Hocquenghem  double-error 
correcting  codes  of  the  same  redundancy.  The  quaternary  Dumer- 
Zinoviev  codes  are  the  only  known  q-ary  double-error  correcting 
codes  which  asymptotically  meet  the  Hamming  bound  for  ^  >  3. 

We  derive  simple  criteria  to  decide  whether  1,  2  or  3  errors 
have  occurred  when  one  of  these  codes  is  used  for  data  trans¬ 
mission.  Based  on  these  criteria  new  decoding  algorithms  are 
proposed,  which  are  faster  and  simpler  to  implement  than  the 
known  ones.  The  main  improvements  compared  with  the  known 
algorithms  are  two.  First,  a  quadratic  equation  only  has  to  be 
solved  when  two  errors  have  occurred.  Secondly,  some  calcu¬ 
lations,  especially  the  inversion,  can  be  carried  out  in  a  field 
considerably  smaller  than  the  ground  field. 

Summary 

In  this  paper  we  present  algebraic  decoding  algorithms  for 
two  classes  of  double-error  correcting  codes;  the  Zetterberg  bi¬ 
nary  codes  (1)  and  the  Dumer-Zinoviev  quaternary  codes  [5]. 

Let  n  =  2*'  -f-  l,s  >  1  and  let  o  be  a  primitive  n-th  root 
of  unity  in  the  finite  field  GF(2*’).  The  Zetterberg  code  C, 
is  a  binary  cyclic  code  of  length  n  generated  by  the  minimal 
polynomial  g,{x)  of  a  over  GF{2).  The  code  C,  has  dimension 
fc  =  n — 4s,  minimum  Hamming  distance  5  and  covering  radius  3. 
The  Zetterberg  codes  are  the  best  known  family  of  double-error 
correcting  binary  linear  codes.  They  are  longer  than  the  BCH 
double-error  correcting  codes  of  the  same  redundancy. 

The  known  decoding  algorithm  [2]  requires  to  solve  a  quadratic 
equation  in  order  to  decide  whether  2  or  3  errors  have  occurred. 
We  derive  a  simple  criterion  which  makes  it  possible  to  deter¬ 
mine  in  advance  the  number  of  errors  and  suggest  a  new  algo¬ 
rithm  with  considerably  lower  time  and  space  complexity. 

Let  e(x)  be  an  error  vector  and  denote  by  Si  =  e(a')  the 
syndromes.  Let  7’r(e)  =  c  -f- 1*  -f  t*’  -f  ...  -(- 1>’"'  be  the  trace 
function  from  GF(2**)  to  GF{2).  Set  7  =  5i5_i. 

Lemma  1  ^  =  I  iff  one  error  has  occurred. 

Lemma  2  Tr(‘y~^)  =  \  iff  two  errors  have  occurred. 

Based  on  the  above  lemmas  the  following  algorithm  has  been 
established  [3]  [4]. 


Step  1. 

Calculate  Si  =  r(or)  and  go  to  step  2. 

Step  2. 

If  5i  =  0  ther)  no  error  has  occurred.  Otherwise  go 
to  step  3. 

Step  3. 

Calculate  7  =  5";  if  7  =  1  there  is  a  single  error  with 
locator  Si .  Otherwise  go  to  step  4. 

Step  4. 

Calculate  7'*  and  Tr(7"*);  if  Tr{-i~')  =  1  go  to  step 
5.  Otherwise  three  errors  have  occurred. 

Step  5. 

Two  errors.  Solve  the  equation  6*  -t-  6  -(-  7"’  =  0 
and  correct  two  errors  on  positions  cd  =  5i6'^*,q’  = 

cd  -1-  Si. 

Notice  that  since  7  6  GF(2’*)  the  computation  of  7"'  can 
be  done  in  GF(2**).  The  results  about  decoding  complexity  for 
the  codes  C*  and  C3  show  that  the  new  algorithm  has  consider¬ 
ably  lower  time  and  space  complexity  compared  to  the  known 
one  whenever  2  or  3  errors  have  occurred. 

Similarly,  we  establish  a  new  decoding  algorithm  for  the  irre¬ 
ducible  Dumer-Zinoviev  codes  [6).  Notice  that  these  are  the  only 
known  q-ary  double-error  correcting  codes  which  asymptotically 
meet  the  Hamming  bound  for  q  >  3. 
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Abstract 

We  propose  a  fast  decoding  algorithm  for  a  class  of  geometric  Goppa  codes  defined  on  certain  algebraic  plane 
curves,  associated  with  Artin-Schreier  extensions  of  ^((x),  introduced  by  Stichtenoth  [3].  Although  we  do  not 
attempt  here  to  treat  all  the  class  of  curves  introduced  by  Stichtenoth,  we  do  include  certain  elliptic,  hyperelliptic 
and  Hennitian  ctirves.  These  curves  are  defined  by  the  homogeneous  equation  over  an 

arbitrary  finite  field  Ff  of  characteristic  p,  where  a  and  h  are  relatively  prime  integers  such  that  a  =  p''{v  €  N  *), 
a  <  b  and  the  zeros  of  y*  +  y  form  an  additive  subgroup  of  F,  of  order  p”.  The  main  step  of  the  proposed  algorithm 
is  to  solve  a  key  equation  studied  by  Porter,  Shen  and  Pellikaan  [1].  For  this  purpose,  we  derive  explicit  formulas  for 
certain  dififerential  forms,  which  are  used  to  construct  the  syndrome  of  the  codes  defined  on  the  above-mentioned 
curves,  and  propose  a  modified  version  of  Sakata’s  algorithm  [4].  Further,  we  prove,  in  work  inspired  by  Shen’s  study 

[2],  that  the  Porter-Shen-Pellikaan  key  equation  for  codes  defined  on  the  curves  treated  here  can  be  solved  by  using 
our  modified  Sakata  algorithm  with  complexity  0(dj„a  +  g*a),  where  is  the  designed  minimum  distance  and  g 
is  the  genus  of  the  curve.  The  proposed  decoding  algorithm  may  be  regarded  as  an  extension  of  Shen’s  algorithm 
[2]  for  Hermitian  codes  to  a  wider  class  of  codes.  For  certain  hyperelliptic  codes,  this  algorithm  can  decode  up  to 
[(dsu  ~  l)/2j  errors  with  complexity  O(n’),  where  n  is  the  word  length  of  the  code. 
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Abstract:  We  investigate  the  expansion  factor  of  a  linear 
block  code.  The  expansion  factor  /1(C)  of  an  (njfc;t/)  code  with 
codebook  C  is  the  maximum  over  all  generator  matrices  G  whose 
row  space  is  C,  of  the  minimum  of  vv(xG)  -  vv(x)  over  all  non¬ 
zero  inputs  X.  One  can  view  /1(C)  as  a  measure  of  the  "continuity" 
of  the  code.  It  indicates  how  well  the  code  preserves  and  expands 
the  distance  relations  of  the  input.  We  show  that  the  expansion 
factor  is  bounded  as  d-k^  d(C)  S  d  -  1 .  We  also  relate  it  to  the 
weight  distribution  of  the  code,  and  the  output  length  n.  Finally  we 
find  the  expansion  factor  for  a  number  of  codes,  including 
Hamming,  Equidistant,  Golay,  and  BCH  codes. 

Introduction 

Ctonsirfer  the  ^blem  of  error-control  coding  over  the  q-aiy 
symmetric  channel  with  crossover  probability  ^(^-1)  using  linear 
block  codes.  Given  an  (n,k-,d)  code  with  alphabet  A  and 
codebook  C,  let  S(C)  be  the  set  of  all  generator  matrices  whose 
row  space  is  C.  The  expansion  factor  5(G)  of  G  e  S(C)  is 
delink  as 


5(G)  =  ™"  (w(xG)-w(x)) 

X€  A* 

x»0 


The  expansion  factor  /1(C)  of  the  code  is  given  by 


/1(C)  = 


max 

GeStO 


5(G) 


We  also  pick  (5*  in  5(C)  with  5(G*)  =  d(C)  and  call  it  the 
expansion  matrix  of  C.  The  expansion  factor  indicates  how  well  the 
code  preserves  and  expands  the  distance  relations  of  the  input.  This 
is  imj^rtant  when  the  code  is  a  stage  in  a  cascade  of  codes,  and 
generally  whenever  the  input  distances  are  meaningful.  It  can  be 
helpful  to  think  of  /1(C)  as  a  measure  of  "continuity",  and  G*  as 
the  most  "continuous"  generator  matrix  in  5(C).  Thus  given  C  and 
C'  with  the  same  parameters  except  for  /1((7)  >  /1(C0,  one  should 
choose  C  and  use  its  G*. 


0  5  i  5  d  -  1 
d  ^  i  <  k  +  Sz 
k  +  Sz+l^i^n 


The  next  result  combines  Proposition  2  and  the  fact  that  the 
total  weight  of  a  linear  code  is  given  by  « (^  -  1)  (the  basis  of 
the  Plot)^  bound  [S].) 

Proposition  3:  The  length  n  of  an  (njc,d)  code  C  with 
expansion  factw  /1(C)  S  5}  satisfies 

A 

n  S  N(k-,d-,Si)  = - - - tt 

(<7  -  1)  <7*-* 

where 

Si  a  h  ^ 

£  ibi 
i=d 


Proposition  2  yields  an  upper  bound  to  /1(C)  in  terms  of  the 
weight  distribution.  Pre^sition  3  yields  a  looser  upjjer  bound  to 
/1(C)  in  terms  of  n. 

Examples 

We  find  the  expansion  factor  and  the  expansion  matrix  for 
several  Hamming,  Golay,  and  BCH  codes  ([4],  [5].)  We  also 
discuss  examples  of  expansion  codes  [1],  for  which  the  uppw 
bound  of  Proposition  1  is  achieved  (/1(0  =  d  -  1),  and  equidistant 
codes  [5],  for  which  the  lower  bound  of  Proposition  1  is  achieved 
(/1(C)  =  k-d.)  We  compare  the  values  of  /1(C)  with  the  estimates 
found  fitjm  Propositions  2  and  3.  For  instance,  the  familiar  (7,4;3) 
Hamming  code  has  /1(C)  =  0,  and  Proposition  2  and  3  yield  the 
upper  bounds  ft  =  5}  =  1.  And  for  the  (15,5;7)  BCH  code,  /1(C) 
=  4,  ft  =  4,  and  ft  =  5. 


To  further  clarify  the  notion  of  expansion  factor,  consider  an 
encoder  described  by  a  generator  matrix  G  with  expansion  factor 
5(G),  and  a  pure  error  detection  decoder.  An  information  word  x  is 
mapped  into  codeword  y,  which  is  sent  over  a  noisy  channel.  If  the 
channel  output  is  a  codeword  y'  with  corresponding  information 
word  x',  then 

d(x,x')  Sd(y,y')-  5(G) 


Related  Work 

The  notion  of  expansion  factor  can  be  modified  by  taking 
into  account  the  fact  that,  on  ncwmal  channels,  error  panems  of  large 
weight  occur  with  very  small  probability.  Thus  one  can  define  a 
bounded  expansion  factor,  where  only  low  weight  codewords  are 
included  in  ihc  compuiadou  C(G)  and  A{,C).  7711$  idea  is  related  to 
the  work  in  [3]. 


This  suggests  that  for  a  fixed  codebook  C,  given  a  choice  of 
generator  matrices,  one  should  pick  G*  to  minimize  the  number  of 
input  errors  caused  by  channel  errors.  An  upper  bound  to  the  bit 
error  probability  as  a  function  of  5(G)  follows  from  the  above 
inequality.  The  bound  is  similar  to  the  one  in  [2]. 

Results 

We  list  our  theoretical  results.  The  proofs  are  omitted  in  this 
summary. 

Proposition  1:  The  expansion  factor  /1(C)  of  an  (njt;d) 
code  C  is  bounded  by 

d-kiA(0^d-  1 

Let  the  weight  distribution  b,-  be  the  number  of  codewords 
of  weight  no  greater  than  i,  OSi^n. 

Proposition  2:  The  weight  distribution  ft  of  an  (njt;d) 
code  C  with  expansion  factor  A(C)  2  ft  satisfies 
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Shot-noise  processes,  also  known  as  filtered  point  processes, 
constitute  an  important  class  of  mathematical  models  used  to 
understand  physical  phenomena  ranging  from  the  measurement 
of  nerve  impulses  in  the  brain,  to  the  formation  of  images  on  film 
exposed  under  low-level  illumination,  to  the  electric  current  gen¬ 
erated  by  photodiodes  used  in  optical  communication  systems. 
Hence,  it  is  unfortunate  that  in  most  cases,  shot-noise  densities 
must  be  obtained  by  using  special  techniques  such  as  contour 
integration  to  numerically  invert  their  characteristic  functions. 
The  purpose  of  this  presentation  is  to  suggest  a  new  method  for 
computing  both  shot-noise  cumulative  distributions  and  densi¬ 
ties.  In  fact,  as  shown  in  [1],  the  method  is  quite  general  and 
can  be  used  to  recover  any  continuous  cumulative  distribution 
from  it  characteristic  function  without  numerical  integration. 

Consider  the  real- valued  process  {Zt}  given  by 

z,  = 

where  the  {T„}  are  points  of  a  Poisson  process  with  nonnega¬ 
tive  intensity  A(  )  and  {A^}  is  an  independent,  identically  dis¬ 
tributed,  nonnegative  “gain"  sequence.  We  assume  that  the 
sequences  {Ay}  and  {Ty}  are  independent  of  each  other.  The 
deterministic  function  h  is  the  system  impulse  response  or  point 
spread  function,  depending  on  the  application. 

Let  t  be  fixed  and  set  g(r)  =  h(i  —  r).  We  now  focus  on  the 
random  variable 


y  ^  z,  =  EAyg(ry). 

y 

Let  F(y)  denote  the  cumulative  probability  distribution  of  V. 
We  assume  that  F  is  continuous  everywhere  except  the  origin, 
where  it  has  a  jump  discontinuity  of  size  e~^.  Let  Fq  denote  the 
measure  defined  by 

ro(C)  =  /  A(r)dr. 

If  Fo  is  a  finite  measure  with  density  70,  then  (t)  B  =  P(Ay  > 
0)  •  Fo(lR),  (ii)  the  function  7,  defined  by 

a  r-f7oWAy),  1 

— — — /(4.>0)J, 

is  integrable,  and  (iii)  its  Fourier  transform, 

is  well  defined.  It  is  shown  in  [1]  that 
Cr{y)  4  lim  f;  -  l]e--’"'»/^,  (1) 


where  =  —Unr  for  n  odd,  hs  =  1/2,  and  b„=:0  otherwise,  is 
well  defined  and  that 


P{Y  >  V) 


e-Wy)  +  f(v)l,  »>o, 

e-®((7'(y)  +  t)(y)  -I- 1),  y<  0, 


where 


viy)  =  f'iWd0. 


Furthermore,  if  O'  is  absolutely  continuous,  then  the  cumulative 
distribution  F(y)  =  1  -  P(K  >  y)  has  density 


/(y)  =  « 


-B 


Hv)  +  7(y)  -  • 


In  many  instances  (1),  the  functions  q  and  7  can  be  com¬ 
puted  in  closed  form,  and  $  can  be  expressed  in  terms  of  special 
functions.  In  these  cases,  we  only  need  to  worry  about  O' .  We 
appmximate  (1)  by  taking  L  finite  and  replacing  the  infinite  sum 
with  a  finite  sum.  In  examples  we  have  considered,  ^(w)  decays 
no  slower  than  since  i,  decays  like  l/r»,  the  terms  of  the 

series  decay  like  1/n*.  The  series  therefore  converges  to  a  con¬ 
tinuous  function.  This  implies  that  the  Gibbs  phenomenon  will 
not  be  present.  Samples  from  the  Fourier  series  ate  computed 
with  a  fast  Fourier  transform,  and  a  cubic  spline  is  then  fit  to 
the  samples.  To  approximate  -^G^iy),  we  simply  differentiate 
the  cubic  spline  between  its  knots.  The  result  of  applying  this 
technique  to  [1,  Example  2]  is  shown  below  in  Fig.  1. 
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Fig.  1.  Approximation  of  /(y).  Impulse  at  origin  not  shown. 
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Cochannel  and  intersynbol  interference  can  often 
be  modeled  as  the  sun  of  an  Infinite  series  of  randoa 
variables  with  weights  which  decay  rapidly.  One 
important  example  is  provided  by  Interference  which 
results  from  passing  data  through  a  causal  filter 
consisting  of  lumped  elements ,  For  many  practical 
applications,  this  model  yields  a  random  variable  with 
a  distribution  which  is  singular  (concentrated  on  a 
set  of  measure  zero)  but  diffuse  (non-atomlc) ,  such  as 
the  Cantor  distribution.  Because  this  model  is 
Intermediate  between  the  case  of  random  variables  with 
a  density  function  and  the  case  of  discrete  random 
variables,  there  are  no  we 11 -developed  mathematical 
tools  for  calculating  expectations.  A  trapezoidal 
rule  for  evaluating  expectations  is  developed  in  this 
paper.  Upper  and  lower  bounds  on  expected  values  are 
also  given.  Finally,  with  a  view  to  future 
applications,  some  of  the  mathematical  properties  of 
the  associated  distribution  function  are  examined. 

We  consider  here  modeling  interference  as  a  random 
variable 


where  {X^,X2, ...}  is  a  sequence  of  independent 

identically  distributed  random  variables,  each  of 
which  can  take  on  one  of  H  possible  values,  and  where 
is  a  known  sequence,  possibly  vector¬ 
valued.  In  the  applications  envisaged  here, 

X^  represents  the  k-th  interfering  information  digit, 

while  a  h(kT),  where  h(t)  is  the  channel  Impulse 

response  function  and  T  is  the  sampling  Interval. 
Thus  the  terms  in  the  sequence  typically  decay 

fairly  rapidly,  and  the  sum  is  not  one  to  which  the 
central  limit  theorem  applies.  Indeed,  for  fairly 
typical  channel  parameters,  the  distribution  can  be  of 
Cantor  type,  concentrated  on  a  nondenumerable  set  of 
Lebesgue  measure  zero  [1].  Such  distributions  have 
neither  density  functions  nor  discrete  probabilities. 

For  error  probability  calculations  on 

communication  channels,  one  typically  wishes  to 
calculate  E[g(Z)]  for  some  smooth  function  g.  Because 
of  the  special  nature  of  the  rauidom  variable,  we 
cannot  employ  either  of  the  two  standard  tools, 
involving  an  integral  of  a  density  function,  or  an 
infinite  series  with  weights  equal  to  the 
probabilities,  to  compute  E[g(Z)].  It  is  possible  to 
express  Z  as  a  discontinuous  function  defined  on  the 
unit  Interval  (0,1)  and  to  write  E(g(Z)]  as  an 
Integral  on  this  Interval.  This  integral  can  be 
approximated  arbitrarily  well  by  a  trapezoidal 
Integration  rule  The  only  condition  which  must  be 
assumed  on  the  coefficients  is  that  the  series  £0^ 


converge  absolutely.  If  they  satisfy  the  stronger 
condition  =  0(r*)  for  r<M‘  then  the  distribution 

is  singular.  Fairly  simple  but  tight  upper  and  lower 
bounds  for  E(g(Z))  can  also  be  obtained  with  the  aid 
of  Jensen’s  Inequality,  under  mild  restrictions  on  the 
coefficients  and  on  the  convexity  of  the  function  g. 

Some  graphs  of  error  probability  as  a  function  of 
signal  to  noise  ratio  and  channel  bandwidth  are  given 
to  illustrate  the  possibilities.  Typically,  for  small 
bandwidth,  the  interference  is  the  dominant  effect, 
while  for  large  bandwidth  the  noise  dominates.  For 
some  receiver  structures,  there  is  an  intermediate 
bandwidth  at  which  the  error  probability  is  smallest. 
These  results  extend  and  Improve  the  results  of 
Wlttke,  Smith,  and  Campbell  [1]. 

Because  the  distribution  of  Z  is  rather 
pathological,  it  would  probably  be  useful  to 

understand  its  properties  better.  As  was  mentioned 
above,  the  set  of  possible  values  of  Z  is  frequently 
nondenumerable,  but  of  Lebesgue  measure  zero.  In 
these  circumstances,  the  Hausdorff  (fractional) 
dimension  of  this  set  provides  a  finer  measure  of  its 
size.  A  calculation  of  this  dimension  is  difficult  in 
general,  but  for  the  case  0^“  0(r  ),  for  r<M  ,  the 

dimension  is  bounded  above  by 

-dog  M)/(log  r). 

This  bound  approaches  one  as  r-»l/M  and  it  approaches 
zero  as  r-»0.  The  calculation  of  this  dimension 
provides  some  additional  Insight  into  the  significance 
of  a  result  of  Garsla  (2)  about  entropy  and 
singularity  of  infinite  convolutions.  Also,  when  the 
distribution  function  is  singular,  but  continuous,  its 
derivative  can  be  evaluated  as  a  generalized  function 
(Schwartz  distribution)  which  is  neither  an 
Integrable  function,  nor  a  series  of  Impulse 
functions.  The  properties  of  this  derivative  can  also 
be  related  to  the  Hausdorff  dimension  mentioned  above. 
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The  Asymptotic  Equivalence  of  Investing 
with  and  without  Replacement 

Thomas  M.  Cover 


Abstract  and  Summary 


Consider  the  following  scenarios  for  a  sequence  of  vectors 
Xi,  X], . . . ,  Xn  of  price  relatives  corresponding  to  the  history  of  a 
finite  collection  of  stocks  over  a  period  of  n  investment  periods: 
1)  Nothing  is  known  about  the  sequence  of  vectors;  2)  The  vec¬ 
tors  in  the  sequence  are  known,  although  the  order  is  not  known 
and  the  vectors  are  drawn  independently  with  replacement  from 
this  set;  3)  The  collection  of  vectors  is  known,  although  the  or¬ 
der  is  unknown,  but  the  vectors  are  drawn  without  replacement 
from  this  set. 

Clearly  the  amount  of  wealth  that  can  be  generated  in  these 
scenarios  increases  as  the  amount  of  information  increases.  For 
example,  in  scenario  2  one  knows  the  empirical  distribution  of 
the  market,  whereas  in  1,  one  does  not.  In  3,  end-play  can  be 
used. 

We  shall  argue,  for  bounded  vector  sequences,  that  the  uni¬ 
versal  portfolio  algorithm  [1] 


I>*+i 


Jbnf.,b«x<d6 

/n?-.b'x,d6 


for  scenario  1  will  perform  as  well  to  first  order  in  the  exponent  as 
the  best  algorithms  in  scenarios  2  and  3.  Thus  even  end-play  on  a 
known  collection  of  vectors  of  pm  j  relatives  cannot  outperform 
this  universal  portfolio  based  on  no  knowledge  whatsoever,  at 
least  to  first  order  in  the  exponent. 

The  growth  rate  of  wealth  in  all  three  scenarios  is  given,  to  first 
order  in  the  exponent,  by  the  doubling  rate  W“  (a  generalization 
of  entropy  rate),  which  is  given  by 


W’  =  mwij^logb'xi, 

*»  ".-1 

where  Xi,X], . . .  ,Xfl  is  the  sequence  of  vectors  of  price  relatives 
for  the  n  trading  days,  and  the  maximization  is  over  all  portfolios 


for  the  best  portfolio  algorithm  b,-  in  each  scenario  is  given  by 
5,  =  2*'*" +*<"). 
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b  =  (6,,6,,...,ft„),6.>0,5^6,  =  l. 

im\ 


Thus  the  wealth 


=  n  b|x, 

im\ 
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STATE  PRICES  AND  GIBBS  STATES 


Michael  J.  Stutzer 
Dept,  of  Finance 
Carlson  School  of  Management 
University  of  Minnesota 


ABSTRACT 

The  foundation  for  the  theory  of  asset  prices  in 
the  absence  of  riskless  arbitrage  opportunities  is  the 
existence  and  use  of  normalized  Arrow-Debreu  state 
prices,  also  called  a  risk  neutral  probability  distribution. 
Under  this  distribution,  an  asset’s  price  is  predicted  to 
be  the  risklessly  discounted,  present  value  of  its  future 
payoff.  The  Bayesian,  information  theoretic  view  of 
inference  directs  us  to  use  a  generalized  exponential 
distribution  solving  a  constrained  entropy  problem  (i.e. 
a  Gibbs  state),  as  an  estimator  of  these  risk-neutral 
probabilities.  Use  of  the  Gibbs  state  provides  simple 
derivation  of  powerful  asset  pricing  predictions,  and  also 
uncovers  an  isomorphism  between  statistical  mechanics 
and  asset  pricing,  paving  the  way  for  future  development 
of  new  asset  pricing  predictions  which  exploit  the 
isomorphism. 

The  paper  uses  only  simple  mathematics  to  make 
these  points,  and  is  self-contained,  in  the  hope  that  it 
will  stimulate  additional  interdisciplinary  interest  in 
financial  economics. 


SUMMARY 

It  is  well-known  that  prices  of  contingent  claims 
in  complete  and  arbitrage-free  securities  markets  can  be 
computed  using  normalized  Arrow-Debreu  state  prices, 
also  called  risk  neutral  probability  measures.  This  paper 
uses  simple  mathematics  to  explore  the  value  of  a 
Bayesian  approach,  called  the  maximum  entropy 
formalism  (MEF),  in  selecting  a  risk  neutral  probability 
measure  in  situations  of  incomplete  financial  markets. 
The  investigation  is  conducted  within  what  is  perhaps 
the  simplest  possible  multiperiod  setting,  i.e.  a  discrete 
time  approximation  to  a  correlated  exponential  Wiener 
vector  process.  The  resulting  risk  neutral  probability 
measure  is  from  an  exponential  family  called  canorucal 
distributions  or  Gibbs  states. 

Gibbs  states  have  a  form  which  facilitates  passing 
to  the  continuous  (■•ne  limit.  Doing  so  shows  that  the 
limiting  Gibbs  '.a'",  is  parametrized  by  a  vector  of 
parameters  wh’ch,  normalized,  arc  the  portfolio 
weights  in  the  t^amiiiar  mean-variance  efficient  tangenq/ 
portfolio  of  tiK-  b  efved  risky  assets.  This  limiting 
Gibbs  state  is  used  to  produce  a  multi-beta, 
approximate  arbitrage  pricing  theory,  which  linearly 
restricts  asset  excess  returns  and  covariances  with  N 
observable  traded  factors.  The  coefficient  vector  in  the 
linear  relation  is  the  vector  of  the  factors’weights  in  the 
canorucal  mean-variance  efficient  portfolio  of  the  N 
traded  factors  and  the  riskless  asset.  Large  deviations 
theory  is  then  used  to  provide  a  frequentist  rationale  for 


the  canonical  distribution,  which  in  turn  is  used  to 
describe  the  isomorphism  between  the  statistical 
mechanics  of  large  physical  systems  and  the 
arbitrage-free  pricing  of  contingent  claims  in  continuous 
time. 

To  illustrate  the  potential  for  exploiting  this 
isomorphism  to  generate  testable  predictions  in  complex 
circumstances,  we  used  it  to  predict  the  change  in  a 
country’s  riskless  interest  rate  following  integration  of  its 
bond  market  into  that  of  another  “country”  (e.g.  the  rest 
of  the  world).  The  prediction  is  that  the  post-integration 
interest  rate  will  be  a  weighted  average  of  the  countries’ 
pre-integration  interest  rates,  with  the  weighting 
dependent  on  the  respective  countries’  tangency 
portfolios. 
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On  the  Optimality  and  Stability  of 
Exponential  Twisting  in  Monte  Carlo  Estimation  ' 

John  S.  Sadowsky^,  Purdue  University 
School  of  Electrical  Engineering,  West  Lafayette,  IN  47907-1285 


Let  P(-)  be  a  probability  distribution  on  R,  and  let  'Pp{-)  de¬ 
note  the  i.i.d.  distribution  for  the  sequence  {Xjk}  with  marginal 
/’(•).  Define  S„  =  X*  and  consider  the  probability  p„  = 
PpiSn  >  -yn)  where  7  >  Ep[X*].  By  Cramer’s  theorem  we  have 
Pn  where  1(7)  is  the  convex  conjugate  of  A(q)  = 

log(  Ep(e“'*] ).  The  notation  a„  ~i,/)  means  lim„_oo  log(a„)/n 
=  This  should  not  be  confused  with  a„  ~  bn,  which  means 

linUn— *00  ®n/^n  —  1* 

This  paper  considers  the  problem  of  estimating  p„  via  the 
Monte  Carlo  technique  of  importance  sampling.  Let  Mp  denote 
the  family  of  all  distributions  Q(-)  such  that  P(-)  C  Q(-).  In¬ 
dependent  i.i.d.  n-tuples  #  =  1,..,L„,  are  sampled 

from  the  i.i.d.  sequence  distribution  and  applied  to  the  sam¬ 
ple  mean  estimator 


p.  -  f  E  >4*1'’.  n  gK') 

/=1  *=1  “V 


where  ==  {(xi,  ..,Xn):  >  n't}  and  l£„(*)  is  the  indi- 

cator  function.  Write  =  lE„(Xi,..,X„)nZ=i  ^(X*).  Then, 
provided  Q{-)  €  Mp,  we  have  E<3[p„]  =  E<j[2„]  =  p„,  which  is  to 
say  that  pn  is  an  unbiased  estimator  for  Pn. 

P^‘‘^dx)  =  exp(ax  -  A(a))f‘(dx),  whenever  A(a)  <  oo,  is 
the  exponentially  twisted  distribution  for  twisting  parameter  a. 
We  show  that  where  0  solves  \'{6)  =  7,  has  very  strong 

nonparametric  asymptotic  optimality  properties  as  a  sampling  dis¬ 
tribution. 

For  a  fixed  integer  u  >2  suppose  that  we  set  L„  to  stabilize  the 
vth  error  moment-,  that  is,  set  L„  so  that  |  Eq[  (pn  —  Pn)"  1 1  ~  cp“ 
with  0  <  c  <  00.  For  example,  for  1/  =  2  we  set  L„  ~  v„(Q)/f'^p^, 
where  v„(Q)  =  vargjZn],  in  order  to  achieve  var<j(p„]  ~  €*pj. 
In  general,  we  show  that  stabilization  of  the  i/th  error  moment 
requires  sampling  cost  of  exponential  order.  Specifically, 


in  ~LD  «tp(  a„(7;  Q)  n  ) 

where  a„(-y-,Q)  >  0  for  all  Q(-)  €  Mp.  Moreover,  for  all  integers 
v>2 

a^{r,Q)  =  0  if  and  only  if  Q{  )  =  /’<*>(•). 

This  extends  the  original  work  by  Bucklew,  Ney  and  Sadowsky  (J. 
Appl.  Prob.,  March  1990)  that  originally  obtained  the  result  for 
the  case  p  =  2. 

Moreover,we  show  here  that  /’(*•(•)  also  asymptotically  mini¬ 
mizes  the  error  moments  sample  variance  estimator  in  the  same 
sense  as  above.  This  result  impacts  directly  on  the  practical  issue 
of  setting  L„  ~  v„(Q)/e^p^.  We  generally  do  not  know  either  p„ 
or  VniQ)  a  priori.  Both  must  be  estimated.  In  practice,  L„  is 
increased,  either  continuously  or  in  batches,  until  L„  >  t)„/(e*pj|) 
where  i)„  is  the  sample  variance.  Our  result  that  P***(-)  asymptot¬ 
ically  minimizes  the  error  moments  of  both  p„  and  v„  lends  much 
credibility  to  this  practical  stopping  rule. 

'Tb  appear  in  IEEE  7Van«.  on  Information  Theory,  Jan.  19S3. 

’This  work  was  supported  by  the  National  Science  Foundation,  grant  No. 
9003007- NCR 


The  stability  of  suboptimal  estimators  is  also  addressed.  Sup¬ 
pose  that  one  must  accept  the  exponential  sampling  cost  Ln  ~ld 
exp(oj(7;Q)n)  of  a  suboptimal  Q(-).  If  a„{‘T,Q)  >  03(7; Q)  for 
some  p  >  2,  then  the  i/th  error  moment  of  sample  mean  and  the 
(i//2)th  error  moment  of  sample  variance  will  be  unstable.  In  par¬ 
ticular,  04(7;  Q)  >  02(7;  Q)  implies  that  the  variance  of  the  sample 
variance  will  be  unstable,  and  this  in  turn  implies  instability  and 
poor  convergence  of  the  practical  stopping  rule.  We  say  Q{-)  is 
completely  asymptotically  stable  if  Q)  =  02(7;  Q)  for  all  inte¬ 
gers  p  >2.  A  new  result  proved  here  is  that  the  entire  parametric 
family  a  >  0}  C  Mp  is  completely  asyrmptotically  sta¬ 

ble.  This  result  has  important  practical  significance  because  in 
some  multidimensional  applications  it  can  be  difificult  to  precisely 
determine  the  optimal  twisting  parameter  vector. 

A  simple  example  illustrates  the  importemce  of  the  stability 
issue.  Take  P(-)  to  be  the  Laplacian  distribution  with  p.d.f.  p(i)  = 
c-''+‘V2,  fix  n  =  30  and  7  =  0.  Then  p„  =  Vp(S3o  >  0)  w  lO"". 
Consider  two  sampling  distributions:  the  linear  shift  with'  p.d.f. 
q{x)  =  and  The  figures  below  present  numerical 

results  of  three  independent  runs  of  each  estimator,  plotted  with 
empirical  standard  deviation  error  bars  The  linear 

shift  results  clearly  exhibit  unstable  behavior.  Error  bars  often  do 
not  overlap,  and  there  is  a  tendency  to  underestimate  Pn,  in  some 
cases,  by  a  full  order  of  magnitude.  In  contrast,  observe  that  a 
single  horizontal  line  at  10“^  would  pierce  all  of  the  error  bars  of 
the  P***(-)  estimates. 


Finally,  one  might  ask  what  cost  ~i,c  exp(Ooo(7;Q)n )  i» 
required  to  simultaneously  stabilize  all  error  moments  (i.e.,  all  v  < 
00)?  For  the  Laplacian  example  we  compute  aoo(0;Q)  =  1.226. 
However,  for  the  ordinary  Monte  Carlo  estimator  P{-)  = 
which  is  completely  asymptotically  stable,  we  compute  a,o(0;  P)  = 
02(0;  P)  =  0.226.  Thus,  surprisingly,  the  sampling  cost  required 
to  completely  stabilize  the  linear  shift  estimator  is  substantially 
bigger  than  that  of  ordinary  Monte  Carlo! 
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Convergence  of  Probability  Measures  for  Continuous  Sample  Paths  of  Multidimensional  Random  Field  Simulations  Using  Trigonometric  Series 

Robert  Patton  Leland 
Department  of  Electrical  Engineering 
University  of  Alabama 
Tuscaloosa,  AL  35487-0286 


VVe  wish  to  simulate  continuous  sample  paths  of  Gaussian  random 
fields  that  ate  either  homogeneous  or  have  homogeneous  increments, 
and  show  weak  convergence  of  the  sample  path  probability  measures 
on  C[0, 1]“*. 

A  zero  mean  Gaussian  random  field  is  homogeneous  if  its  co- 
variance  function  R  is  shift  invariant,  =  £'[A'(A)-''(pj)1  = 

R(?i  +  P,  Pi  +  p)-  In  this  case  R  has  a  spectral  representation  R(p)  = 
Iniexp(iXp)  di(X).  In  order  to  ignore  slowly  varying  effects,  we  can 
also  describe  the  random  field  by  its  structure  function  D{pi,^)  = 
E[(A'(pi)  -  A'(p2))*].  A  zero  mean  Gaussian  random  field  has  ho¬ 
mogeneous  increments  if  D(pi,^)  is  shift  invariant.  Homogeneous 
increment  random  fields  have  the  spectral  representation  [6] 

D(p)  =  2l  0-cos{X■p))d^X),  I  -i^d.h(X)<oo 

Jr‘‘-  Jr*-  1  +  [Ap 

We  simulatea  zero  mean  homogeneous  Gaussian  random  field  X(p), 
p  €  R'^,  by  a  random  trigonometric  series 

■Vn(p)  =  a*  cos(AJ  p  +  Bi,)  (I) 

where  the  weights  and  frequencies  AjJ  are  known  and  (dj,)  is  un 
iid  sequence  of  random  phases  uniformly  distributed  on  [0,2r].  This 
yields  an  appto.ximate  covariance  function 

«n(p)  =  ilK)’ cos(A;.p)  (2) 

For  proper  choices  of  aj  and  AJ,  will  approximate  R,  as  the  finite 
sum  in  (2)  approximates  the  integral  in  the  spectral  representation 
of  R.  Such  approximations  were  considered  for  d  =  1  in  [3],  and  for 
random  fields  in  (4|. 

VVe  simulate  a  homogeneous  increment  random  field,  .V(p),  as 

-Vn(p)  =  “Z{=os('^I  P  +  Ok)-  cos(flk)}  (3) 

*=i 

The  weights  aj|  and  frequencies  AJ|  are  chosen  to  approximate  the 
structure  function  D(p)  by  Dn{p). 

D,,(p)  =  El{X\(p')  -  An(p'  +  Plf]  =  i:(<iZ)’(l  -  cos(AJ  •  p)) 

k=\ 


For  the  random  fields  Xn(p)  in  (1)  and  (3),  the  even  moments  of  the 
increments  obey 

E[(X„(p')  -  X„{p'  -f-  p ))’"']  <  ^D„(pr 

U  R„  R  (or  Dn  -*  D)  pointwisely,  and  a  condition  on  D(p)  is 
satisfied,  then  the  probability  measures  for  A'n  on  the  space  of  contin¬ 
uous  functions  C(0, 1]“*  converge  weakly  to  that  for  X.  Convergence 
of  the  finite  dimensional  distributions  does  not  imply  the  sample  path 
measures  converge  weakly.  Billingsley  [1]  gives  a  necessary  and  suffi¬ 
cient  condition. 

Theorem  1  //  the  finite  dimensional  distributions  /or  the  random 
fields  X„,  P{X„{fii)  ■  ■  ■  Xn{fim))  converge  to  the  finite  dimensional 
distributions  /or  a  random  field  X  and  /or  every  g  >  0  there  is  an  a 
and  N  such  that  /’„(|A'(0)|  >  n)  <  i|,Vn  >  N  and  /or  every  g,  c  >  0 
there  is  a  6  and  N  such  that  Pn{u),(S)  >  e)  <  i;,  Vn  >  X  where 
Wt(6)  =  maX|;_p>l<(  \X(fi)-X(fi')\  then  the  corresponding  probability 
measures  P„  on  C(0,  1]“*  converge  weakly  to  the  measure  P  /or  X. 

The  next  theorem  is  our  main  result  on  weak  convergence. 

Theorem  2  Let  X  be  a  homogeneous  increment  (possibly  homoge¬ 
neous)  separable  Gaussian  random  field  on  [0,  l)^  with  structure  /unc¬ 


tion  D{p).  Let  .V„  be  the  simulation  /or  X  in  (t)-or  (3).  //  D(p)  < 
.l/|p  P  /or  some  M,  &  >  0,  and  the  weights  and /requencies  art  chosen 
so  Dn{p)  -•  D(p)  (and  Rn(p)  — *  R[p)  /or  X  homogeneous)  point¬ 
wisely,  Dn(p)  <  .U|p  /or  all  n  >  jV,  and  lim„_3o  max*  |aj|  =  0, 
then  the  probability  measures  /or  .V„  on  C[0,  Ij'^  converge  weakly  to 
the  probability  measure  /or  X,  os  n  x. 

Sketch  of  Proof  (see  [2]  for  details) 

The  finite  dimensional  distributions  for  A'„  converge  to  those  of  X 
since  R„  (or  Dn)  converges,  and  the  limit  is  jointly  Gaussian  by  the 
Lindeberg-Levy  version  of  the  Central  Limit  Theorem.  The  conditions 
in  Theorem  1  are  satisfied  for  D(p)  <  -l/|p|^.  Using  a  multidimen¬ 
sional  version  of  the  Kolmogorov  Lemma,  Yadrenko  [5]  (Theorem  2, 
p.  108  )  the  condition  on  Wr{S)  can  be  demonstrated.  O 

For  the  structure  function  to  obey  D(p)  <  iU|p|'’  it  is  sufficient 
that  for  some  0  <  fi  <  2,  M  >  0 


|AP 

1  +  |A|* 


d$(A)  <  00 


(4) 


To  construct  An  with  structure  function  D„(p)  so  that  D(p)  and 
Dn(p)  are  simultaineously  bounded  by  .l/|pp,  we  partition  a  suffi¬ 
ciently  large  region  of  a  half  space  of  R**  into  small  cubes,  and  let 


al=(4/  d<h(A))i 

Jenbet: 


when  the  cube  does  not  touch  on  or  contain  any  non-integrable  sin¬ 
gularities.  The  frequencies  AJ|  can  be  taken  as  the  center  points  of  the 
cubes.  If  there  is  a  non  integrable  singularity  at  the  origin,  for  cubes 
touching  the  origin,  we  take 


L 


For  a  large  enough  number  of  sufficiently  small  cubes,  the  structure 
function  is  approximated  as  requited. 

The  FFT  algorithm  quickly  generates  random  fields  using  trigono¬ 
metric  series  if  the  desired  sample  points  all  lie  on  a  rectangular  grid. 
If  this  is  not  the  case,  Gaussian-Legendre  quadrature  yields  good  ap¬ 
proximations  of  the  integrals  for  R{p)  and  D{p),  and  can  be  modi¬ 
fied  to  handle  singularities.  Gaussian-Legendre  quadrature  was  used 
to  simulate  random  fields  with  the  Von  Karman  spectrum,  and  w-ith 
0(P)  =  Cl\p\^l\  R’. 
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Summary 

Multiresolution  signal  decomposition  and  wavelet 
orthonormal  bases  of  ,  <»)  have  received  increasing 

attention  in  recent  years  in  the  mathematical  and  in  the  signal  and 
image  processing  literatures,  see  [1],  A  multiresolution 
decomposition  of  X.2(-~ ,  ~)  is  an  increasing  sequence  of 

closed  subspaces  of  L2(-m  ,  «•)  with  dense  union,  empty 
intersection,  and  certain  translation  and  scaling  properties  [1], 

The  approximation  of  a  function  /€  .  <»)  at  resolution 

2^  is  the  orthogonal  projection of  /on  Vf  which  is  computed  by 
using  a  wavelet  orthonormal  basis  for  V,, 
=  » .  generated  by  a  scale  function 

^  e  f.2(-°° ,  °°)  by  means  of  dilations  and  translations.  The 
simplest  example  is  the  Haar  basis  where  ^(/)=l(o.i](r)  has 
compact  suppoit  and  is  discontinuous.  There  are  scale  functions 
which  are  k-times  continuously  differentiable  with  compact 
support  [1]. 

The  approximation  of  any  /e  Li(-^ ,  ~)  at  resolution  T* 
thus  has  the  orthonormal  series  representation 

/f(o=  2 

which  converges  in  ,  ~)  norm,  and  whose  coefficients  are 

•• 

.  The  Li  approximation  error  at  resolution 
is 


e?=ll/-Alll= 


ande}  -»0  as 

An  n'*'  order  asymptotic  expansion  for  the  approximation 
error  ej  as  is  esublished  in  [2]  for  functions  /€  /.2(-°°  •  ~) 
with  n  derivatives.  Under  certain  additional  conditions  we  have 

_  £i  +  ...  ^  ^ 


where  fori  i  1, 


^2*~  (^)!  1  [/*’(0)^<*|J(H-v)“^(«)^(v)diidv  . 

The  quality  of  the  approximation  can  be  improved  by  using  a 

•• 

scale  function  with  vanishing  moments:  if  |(r-^y  ^(/)dr=0  for 


some  fi  and  j  =  l,  ■  •  •  ,2(p-l)  where  \ipSnl2  then  the  dominant 
term  in  e}  has  order  2'^. 

We  also  consider  in  [2]  the  wavelet  approximation  at 
resolution  2'^  of  stationary  and  nonstationary  second-order 
random  processes.  All  stationary  and  most  nonstationary  second- 
order  processes  do  not  have  sample  paths  in  L2(-°°  •  H  ttnd  thus 
they  do  not  fit  the  standard  fhunework  of  t2(-<»,‘")  wavelet 
representation.  However,  with  probability  one,  the  sample 
functions  of  mean-square  continuous  stationary  and  nonstationary 
random  processes  are  square  integrable  over  evoy  finite  interval. 
We  therefore  consider  the  wavelet  approxinution  of  such 
processes,  at  resolution  T*,  over  a  finite  interval,  say  [0,71,  and 


we  use  a  scale  function  $  with  compact  support,  say  [0,V].  A 
substantial  simplification  occurs  if  we  use  a  slightly  larger  data 
interval  l-N2^,T-tNT^]  to  compute  the  coefficients  Then 
the  approximation  is 

where  the  sum  is  actually  finite  involving,  for  each  re[0,T].  the 
terns  with  2^  t -V  s  i  s  2'f,  and 

=  / X(t.  a>m)dt  =  -^IJf (^.ffl)#(s)dr  . 

We  provide  an  n**  order  asymptotic  expansion  for  the  integrated 
mean-square  approximation  error, 

T 

elT=EilX(i,a>)-X^t,a>)fclt. 


For  stationary  processes  whose  covariance  function 
A(r)=/;(f,0)  has  n  one-sided  derivatives  at  0,  but  need  not  be 
differentiable  at  0,  we  obtain  the  /i**  order  asymptotic  expansion, 
as /-»<», 


C;  =  1  - /?W(0,))  j  /  |«  -  V 1^  dv  . 

J-  0  0 

and  the  term  o(2~**)  does  not  depend  on  T.  Thus  generally  die 
dominant  term  is  of  order  2~*.  When  the  stationary  process  has  p 
quadratic-mean  derivatives  and  the  moments  of  f  of  order 
1,  -  ■  ■  ,2(p-l)  vanish  (\ipSn/2),  then  the  dominant  term  is  of  order 
2~^.  For  nonstationary  processes,  a  similar  asyiig>totic  expansion 
is  provided,  and  the  dominant  term  is  generally  of  order  2~*  Qt  is 
not  clear  in  this  case  whether  a  scale  function  can  be  matched  to  a 
q.m.  differentiable  process  in  order  to  speed  up  the  rate  of 
convergence).  This  is  also  the  case  for  deterministic  functions 
which  do  not  belong  to  Z.2(-«>,°*)  but  ate  square  integrable  over 
finite  intervals. 

For  nonstationary  processes  with  finite  mean  energy  over  the 

••  w 

entire  real  line:  E  j X^(t , e»)dt  =  Jl7(r,r)dr<«*  we  obtain  an  h* 
order  asynqrtotic  expansion  for  the  integrated  mean-square  error 
«• 

t}  =El[X{t,to)-X^U<0)fdt, 

with  properties  similar  to  those  for  deterministic  functions  in 
L2(^.-). 
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ON  THE  MINIMUM  EXPECTED 
DURATION  OF  A  COIN 
TOSSING  GAME 
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ABSTRACT 

The  following  coin  tossing  game  is  analysed:  A  store  of  N 
fair  coins  is  given  and  it  is  desired  to  achieve  M  heads  in  a  round 
of  tosses  of  the  coins.  To  allow  for  unfavourable  sequences  of 
tails,  restarts  are  permitted  at  any  epoch  in  the  game  where, 
in  any  restart,  all  coins  are  returned  to  store  and  tosses  are  be¬ 
gun  anew  from  tabula  rasa.  A  restart  strategy  is  a  prescription 
which  specifies  when  a  restart  should  be  made.  It  is  desired  to 
estimate  the  minimum  expected  duration  of  the  game  over  all 
restart  strategies,  and  to  find  an  optimal  strategy  which  min¬ 
imises  the  expected  duration  of  the  game.  This  simple  coin  toss¬ 
ing  game,  proposed  by  R.  L.  Rivest,  has  cryptographic  roots  and 
is  linked  to  issues  in  the  factoring  of  integers. 

It  is  shown  that  there  exists  an  optimal  deterministic  strat¬ 
egy  which  minimises  the  expected  duration  of  the  game,  and  a 
backward  induction  algorithm  is  derived  which  efficiently  yields 
the  optimal  strategy.  The  properties  of  the  optimal  strategy  are 
characterised,  and  some  sub-optimal  strategies  analysed.  In  par¬ 
ticular,  it  is  shown  that  if  the  desired  number  of  heads  M  is  less 
than  or  equal  to  one-half  the  number  of  coins  N  in  the  store,  then 
the  minimum  expected  duration  of  the  game  grows  linearly  in  N-, 
if,  on  the  other  hand,  M  exceeds  one-half  N,  then  the  minimum 
expected  duration  of  the  g^une  grows  exponentially  fast  in  N. 
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ABSTRACT 

We  present  the  results  of  a  simulation  study  for  a  virtual  circuit  connection 
over  an  ATM  network  where  forward  error  correction  is  performed  at  both  the 
ATM  cell  level  and  the  packet  data  unit  (PDU)  level.  A  main  corx:lusion  of  this 
study  is  that  at  low  loads  ATM  cells  from  the  same  source  dominate  in  the 
swKch  buffers,  while  at  high  loads  there  is  a  mixino  of  ATM  cells  from  different 
sources.  For  the  latter  case,  ATM  cell  level  coding  performs  better,  while  for 
the  former,  PDU  level  coding  performs  better.  The  combination  of  the  two 
techniques  has  the  best  overall  performance. 

1  Introduction 

In  broadband  integrated  services  digital  network  using  the  asynchrorxMS 
transfer  mode  (B-ISDN/ATM),  the  end-to-end  propagation  delays  will  typically 
be  much  larger  than  the  duration  of  a  packet.  Consequently,  retransmis¬ 
sions  associated  with  the  conventional  error  detection  and  Automatic  Repeat 
reOuest  (ARQ)  mechanisms  will  increase  the  delay  of  a  packet  intolerably, 
especially  for  loss  and  delay  sensitive  high-speed  applications.  Therefore, 
one  may  use  Forward  Error  Correction  (FEC)  to  improve  reliability  without 
irxireasing  end-to-end  delay.  In  FEC,  redundant  information  is  sent  along  with 
the  original  data  so  that  the  receiver  can  avoid  retransmissions  by  recovering 
lost  informatbn  using  this  redundancy.  However,  there  is  a  trade-off  in  using 
FEC:  adding  redundancy  increases  the  load  in  the  network,  arid  in  turn,  the 
loss  rate.  FEC  can  be  useful  only  when  the  former  effect  prevails. 

In  this  work,  we  simulated  a  long-distance  virtual  circuit  (VC)  connection 
over  an  ATM  network,  and  quantified  the  improvement  in  delay-throughput 
performance  achieved  by  using  FEC.  In  ATM,  the  basic  unit  of  transport, 
switching,  and  queueing  is  a  53-byte  cell.  ATM  cells  are  grouped  into  variable 
size  Packet  Data  Units  (PDUs)  at  the  adaptation  layer.  Some  of  the  PDUs 
arrive  at  the  receiver  with  missing  cells  due  to  buffer  overflows  at  congested 
nodes.  By  adding  parity  cells  to  each  PDU,  some  of  the  lost  cells  can  be  re¬ 
covered.  A  PDU  is  considered  lost  and  retransmitted  if  its  missing  cells  cannot 
be  recovered  by  the  FEC  mechanism.  Our  principal  motivation  is  the  fact  that 
the  nature  of  the  cell  loss  process  strongly  affects  the  performance  of  FEC. 
Adding  parity  cells  to  each  POU  is  effective  when  cells  are  lost  “randomly."  It 
is  not  effective  when  the  cells  are  lost  in  bursts.  In  such  cases,  interleaving 
and  buffer  management  techniques  can  be  used.  To  combat  burst  losses,  we 
employ  FEC  over  PDUs,  in  addition  to  FEC  over  consecutive  cells,  and  our 
results  indicate  this  is  effective.  Unlike  employing  buffer  management  tech¬ 
niques  in  intermediate  rxxfes,  this  method  is  simpler  to  implement  as  it  does 
not  require  processing  at  the  network  nodes  and  can  be  employed  selectively, 
e.g.,  only  for  delay  and  loss  sensitive  applications. 

2  Forward  Error  Correction  In  ATM  Networks 


In  the  method  of  FEC  over  consecutive  ATM  cells,  the  encoder  appends  Ma 
irxfependent  parity  cells  to  each  group  of  Na  information-bearing  cells.  We 
consider  this  block  <^Na  +  Ma  cells  as  a  PDU.  Since  the  receiver  determines 


consider  this  block  of  Na  +  M^  cells  as  a  PDU.  Since  the  receiver  determines 
the  positions  of  lost  cells  by  means  of  sequence  numbers,  it  is  possible  to 
design  an  erasure  channel  code  so  that  up  to  erased  cells  per  PDU  can 
be  recovered.  A  lost  PDU  with  more  than  erased  cells  is  retransmitted 
upon  a  time-out  or  a  retransmission  request  from  the  receiver. 

In  ATM  networks,  once  a  node  is  congested,  H  remains  in  this  state  for 
some  time  resulting  in  consecutive  cell  losses.  Increasing  Is  not  a  good 
solution  to  this  problem  sirKe  it  leads  to  higher  cell  loss  rates  and  limits  the 
throughput.  A  better  solution  is  to  use  a  code  over  PDUs  in  addition  to  FEC 
over  consecutive  cells;  each  block  of  Np  PDUs  Is  followed  by  Mp  independent 
parity  PDUs.  We  call  this  block  of  Np+  Mp  PDUs  a  coding  block.  This  can  be 
view^  as  a  two-dimensional  code  if  consecutive  cells  are  arranged  in  the  form 
of  a  matrix  each  row  of  which  contains  cells  of  one  PDU.  The  issues  related  to 
the  construction  of  parity  cells  lor  both  row  and  column  coding  are  the  same, 
and  therefore,  up  to  Mp  lost  PDUs  per  coding  block  can  be  recovered. 

3  Simulation  Results 

In  the  simulations,  we  consider  a  five-hop  VC  connection  over  a  wide  ge¬ 
ographical  area.  The  end-to-end  propagation  delay  is  taken  to  be  10,240 
slots,  where  a  slot  is  defined  to  be  the  unit  time  needed  to  serve  a  cell  at 
155  Mb/sec,  approximately  equal  to  the  delay  of  a  US-wide  or  a  Europe-wide 
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network  at  this  rate.  In  each  one  of  the  four  intermediate  nodes,  there  is  a 
non-blocking  8x8  ATM  switch  with  output  buffers  of  capacity  B  =  256  ceUs. 
To  measure  the  coding  gain,  we  perform  FEC  on  the  forward  traffic  belong¬ 
ing  to  a  tagged  source-destination  pair.  While  passing  through  the  network 
rtodes,  the  tagged  traffic  interferes  with  the  untaggedcells  belonging  to  other 
source-destination  pairs.  We  assume  that  the  tagged  PDUs  of  cells  arrive 
at  the  source  according  to  a  Poisson  process  with  rate  P/Wy^.  where  p  is  the 
network  load.  Af^  -i-  M^  cells  of  a  tagged  PDU  are  transmitted  in  succes¬ 
sive  slots  under  the  control  of  a  window  flow  control  mechanism  with  PDU 
permits.  This  nnechanism  prevents  the  network  from  getting  into  a  state  of 
ever  increasing  congestion  since  it  limits  the  number  of  PDUs  on  the  VC. 
The  transmitter  stores  each  tagged  PDU  until  an  acimowledgement  message 
(ACK)  is  received  from  the  destination.  In  the  case  of  receiving  a  negative  ac- 
kmwtedgement  message  (NACK),  the  NACICed  PDU  is  retransmitted.  In  real 
life,  ACKs  and  NACKs  flow  back  through  a  similar,  possibly  the  same,  path 
as  the  forward  traffic  follows,  and  hence,  are  subject  to  loss  as  well  as  ran¬ 
dom  delay.  We  assume  for  the  sake  of  programming  simplicity  that  they  flow 
back  through  dedicated  lossless,  constant-delay  channels.  We  still  maintain 
a  timeout  mechanism.  Similarly,  we  assume  independent  Poisson  arrivais  of 
W^-cell  untagged  PDUs  with  rate  p//V«  at  the  7  untagged  input  ports  of  each 
node.  Each  untagged  PDU  chooses  tne  tagged  output  port  independently 
with  probability  f/Band  its  cells  depart  from  the  VC  at  the  downstream  nodes 
indei^ndently  with  probability  7/8.  Also,  the  tagged  and  the  untagged  cells 
are  served  at  the  same  priority  level.  We  also  assume  that  the  receiver  has  a 
cell  level  memory:  the  successful  cells  of  lost  PDUs  are  stored.  This  feature 
has  a  strong  impact  on  the  overall  network  performance  since  it  decreases 
the  amount  of  work  from  one  retransmission  cycle  to  the  next. 

In  the  simulations,  we  fixed  the  parameters  A/^  arxf  Np  as  16  and  256, 
respectively,  and  measured  the  average  PDU  delay,  which  was  defined  as 
the  average  time  that  a  tagged  PDU  spent  in  the  network,  as  a  function  of 
p.  The  results  tor  the  uncoded  (M^  =  Mp  =  0),  only  cell  coded  =  4, 
Mp  =  0),  only  PDU  coded  (M^  =  0,  Mp  =  4),  and  both  cell  and  POU  coded 
(M^  =  Mp  =  4)  cases  are  compared.  The  parameters  M^  =  4  and  Mp  =  4 
were  chosen  according  to  the  results  of  two  optimizations  in  which  we  tried 
M^  €  {0,1,2,3,4,6,8,12},  and  Mp  €  {0,2,4,6,8,10,12,14,16,24,32}. 
respectively.  In  the  PDU  coded  cases,  the  averages  were  computed  over 
information-bearing  PDUs  so  as  to  make  a  meaningful  comparison  with  the 
uncoded  case. 

The  results  show  the  trade-off  between  using  cell  coding  and  PDU  coding. 
For  low  p,  where  cells  of  a  single  connection  dominate  in  the  output  buffers, 
losses  occur  in  rare  bursts  for  buffer  capacities  as  large  as  256,  and  herx^e, 
PDU  coding  outperforms  cell  coding.  For  high  p,  the  frequency  of  burst  losses 
increases,  and  many  cells  from  distinct  connections  interfere  at  the  switch 
outputs  resulting  in  random  losses.  Therefore,  cell  coding  starts  to  perform 
better  as  p  increaises.  The  joint  code  outperforms  only  cell  coding  or  only  PDU 
coding  for  alrTK>st  all  p,  except  for  a  small  degradation  around  p- 0.45,  which 
is  due  to  the  individual  performance  degradation  in  cell  coding.  The  results  of 
the  optimizations  over  M^  and  Mp,  and  the  details  of  transmitter  and  receiver 
implementations  are  available  from  the  authors. 

Finally,  note  that  if  the  successful  cells  of  the  lost  PDUs  were  not  stored 
at  the  destination,  the  average  PDU  delays  would  be  much  higher  for  high 
network  loads.  With  a  cell  level  memory  at  the  destination,  the  lost  PDUs  have 
to  arrive  at  the  receiver  with  fewer  and  fewer  number  of  cells  in  successive 
retransmission  cycles.  However,  when  there  is  no  such  menwy,  the  PDUs 
are  subject  to  the  same  probability  of  loss  in  successive  transmission  cycles. 

4  Summary  and  Conclusions 

We  have  presented  the  results  of  a  simulation  study,  showing  that  the  use 
of  forward  enor  correction  improves  the  performance  of  broadband  networks. 
We  have  concentrated  on  the  performance  over  a  virtual  circuit  connection 
over  an  ATM  network.  The  FEC  technique  is  based  on  transmitting  parity 
packets,  which  are  constructed  by  using  an  erasure  channel  code,  along 
with  information-bearing  packets.  Although  this  may  increase  the  network 
load  leading  to  higher  packet  loss  rates  aixl  limit  the  network  throughput, 
retransmissions  are  avoided  provided  that  sufficiently  many  packets  reach 
the  destination.  In  particular,  we  have  considered  two  types  of  coding;  coding 
over  consecutive  ATM  cells  and  coding  over  consecutive  fixed-length  POUs. 
The  simulation  results  obtained  have  confirmed  our  a  priori  expectation  that 
coding  over  PDUs  would  be  effective  for  burst  cell  losses,  and  indicated  that, 
by  using  FEC  with  correct  parameters,  it  is  possMe  to  reduce  the  average 
PDU  delays  approximately  to  the  extent  of  a  half. 


A  Robust  Error  Control  System 
For 

Broadcast  Channels 


S.  Ram  Chandran,  TeleSciences.  Fremont.  CA.  Shu  Lin.  Univ.  of  Hawaii. 


Abstract 

We  have  proposed  a  robust  error-control  system  for 
a  broadcast  environment.  Xhe  system  is  a  combination 
of  FEC  and  ARQ  schemes.  We  use  a  cascaded  cod¬ 
ing  scheme  with  a  binary  iimer  code  and  an  interleaved 
non-binary  outer  code.  The  decoding  policy  is  chosen 
so  that  we  make  optimum  use  of  outer  code’s  error  cor¬ 
recting  capability.  The  system  also  has  a  parity  retrans¬ 
mission  feature.  We  use  a  dynamic  programming  opti¬ 
mization  method  to  optimize  the  throughput.  We  show 
that  this  system  achieves  very  high  reliability  and  high 
channel  utilization  even  at  extremely  high  bit-error-rates 
(<  10~^).  The  scheme  is  suitable  for  high  speed  data 
transfer  even  when  the  channels  are  noisy. 

1  Summary 

In  this  paper  we  present  a  robust  error-control  system  for  broad¬ 
cast  channels.  The  communication  environment  for  the  pro¬ 
posed  scheme  consists  of  a  single  transmitter  broadcasting  mes¬ 
sages  to  R  receivers.  As  a  special  case,  this  system  can  be  used 
for  point-to-point  communications  also.  The  goal  of  this  system 
is  to  facilitate  high  speed  data  transfer  even  when  the  channels 
are  very  noisy.  The  coding  scheme  used  here  is  similar  to  the 
work  of  Kasami  et.  al.  (1).  We  have  modified  their  cascaded 
coding  scheme  and  also  added  a  parity  retransmission  feature 
to  it.  The  coding  scheme  is  obtained  by  cascading  two  error- 
correcting  codes:  the  inner  code  and  the  outer  code.  The  iimer 
code  is  a  binary  code  designed  for  simultaneous  error  correction 
and  detection.  The  outer  code  is  obtained  by  interleaving  a  non¬ 
binary  code  with  symbols  from  Galois  Field  GF(7f).  This  code 
is  designed  for  correcting  symbol  errors  and  erasiues.  The  er¬ 
rors  handled  by  this  code  are  either  caused  by  the  channel  or  the 
inner  code  decoder,  whereas  the  erasures  are  introduced  by  the 
inner  code  decoder  only.  The  interleaving  facilitates  burst-error 
correction. 

In  addition,  we  also  have  a  parity  retransmission  feature  for 
the  recovery  of  incorrectly  received  messages.  The  retransmis¬ 
sion  scheme  is  a  combination  of  type-1  and  type-2  hybrid  ARQ 
schemes  [3].  We  use  a  selective-repeat  mode  of  retransmission 
and  the  protocol  is  similar  to  the  one  proposed  by  Chandran  and 
Lin  [2].  In  this  retransmission  protocol,  the  number  of  copies  of 
a  message  at  any  given  stage  of  retransmission  is  chosen  to  op¬ 
timize  the  throughput,  using  dynamic  programming  optimiza¬ 
tion.  This  optimization  scheme  takes  into  account  the  number 
of  previous  transmissions  of  the  message  and  also  the  number 
of  receivers  that  are  yet  to  acknowledge  the  message,  thereby 
allowing  us  to  achieve  the  maximum  possible  throughput. 

Parity  blocks  for  retransmissions  are  formed  based  on  the 
original  data  block  and  a  half  rate  invertible  code.  This  code  is 


obtained  by  shortening  the  inner  code.  Therefore,  we  can  use 
the  inner-code  encoder  and  decoder  for  encoding  and  decoding 
of  this  code.  Moreover,  this  code  has  the  same  error-correcting 
capability  as  the  inner  code  but  is  used  on  a  much  smaller  por¬ 
tion  of  the  data  block  therefore  it  is  very  powerful.  The  data 
and  parity  transmissions  both  use  the  same  format  hence  their 
decoding  procedures  are  very  similar.  The  original  message  can 
be  recovered  from  the  parity  block  also,  by  inversion,  if  the  de¬ 
coding  is  successful.  Otherwise,  the  parity  and  data  blocks  are 
combined  for  further  error  correction  using  the  half-rate  code.  If 
the  parity  block  fails  to  recover  the  original  message  by  this  com¬ 
bined  decoding,  then  another  retransmission  is  requested.  The 
next  retransmission  is  a  data  block.  Thus  the  retransmissions 
are  alternate  repetitions  of  data  and  parity  blocks. 

The  key  idea  of  this  coding  scheme  is  that  the  decoding  infor¬ 
mation  is  passed  on  from  inner-code  decoder  to  the  outer-code 
decoder  [1].  Data  and  parity  blocks  consist  of  codewords  from 
inner  and  outer  codes  arranged  in  an  array.  Decoding  process  of 
these  blocks  consist  of  inner  decoding  followed  by  outer  decod¬ 
ing  on  each  of  these  codewords.  The  decoding  policy  is  chosen 
in  such  a  way  that  the  error  correcting  capability  of  the  outer 
code  is  utilized  optimally. 

We  provide  a  complete  analysis  of  the  performance  of  this 
coding  scheme.  We  show  that  even  with  very  simple  inner  and 
outer  codes,  we  can  achieve  a  probability  of  block  decoding  error 
of  10“*®  at  extremely  high  bit-error-rates  (<  10“*).  At  lower  bit¬ 
error-rates,  the  achievable  probabilities  of  block  decoding  error 
are  negligibly  small.  Moreover,  such  high  reliability  is  achieved 
at  very  acceptable  levels  of  channel  utilization.  The  correspond¬ 
ing  probabilities  of  decoding  failure  (retransmission)  is  small. 
Using  the  proposed  scheme,  we  can  achieve  up  to  20%  channel 
utilization  with  a  hundred  receivers  at  a  Bit-Error-Rate=  10“*. 
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The  capacity  (i.e.,  the  max.  total  throughput)  per 
channel  of  code  division  multiple  access  with  su¬ 
perimposed  codes  as  hop  sequence  sets  is  consid¬ 
ered.  The  access  to  a  channel  is  controlled  slot-by- 
slot  by  a  binary  sequence  for  time  hopping.  One 
out  of  Q  hop  frequencies  is  selected  slot-by-slot  by 
the  symbols  of  a  Q-ary  sequence  in  the  considered 
(simplest)  version  of  frequency  hopping.  The  num¬ 
ber  of  (single  server)  channels  K  =  1  for  time,  and 
A'  =  Q  >  1  for  frequency  hopping.  The  length  of 
the  hop  sequence  (the  frame  length)  in  bits  is  de¬ 
noted  by  TV  in  the  binary  as  well  as  in  the  Q-ary 
case,  m  bits  are  conveyed  during  an  active  slot. 
Random  delay,  erasures,  and  erasure  correction  are 
assumed,  but  no  independent  additive  noise. 

Task  A;  z  bits  per  frame  from  at  most  M  window 
active  sources  have  to  be  served  without  error  out 
of  T  potential  sources.  Task  B:  a  Poissonian  source 
population  has  to  be  served,  with  a  demand  rate  A, 
each  source  having  at  most  once  a  packet  of  y  bits  to 
send  next  to  a  demand  (z  =  y/u  bits  are  conveyed 
per  frame).  For  simplicity  for  Task  B  the  same 
single  hop  sequence  is  assumed  at  each  potential 
source,  and  the  f-capacity  per  channel  C{e)  (with  a 
probability  1  —  and  error  free  transmission  up  to 
M  window  active  sources)  is  considered. 

It  was  shown  by  Massey  [1982],  Bassaiygo  and 
Pinsker  [1983],  Tsybakov  and  Likhanov  [1983], 
Massey  and  Mathys  [1985],  for  tlie  basic  Task  A, 
that  C  — ►  1/c  and  by  Csibi  [1991],  for  task  B  that 
C(f)  — ♦  1/e,  and  e  — ►  0,  as  i  — *  oo.  under  distinct 
constraints,  all  for  time  hopping. 

It  can  be  shown  that  for  investigating  the  joint 
possibilities  of  separation  and  errorfree  decoding  the 
disjunction  of  all  possible  cyclic  shifts  of  the  bi¬ 
nary  (resp.  that  of  the  binary  representation  of  the 
Q-ary)  hop  sequences,  modulo  N ,  is  of  our  inter¬ 
est.  Accordingly,  the  capacity  study  may  be  formu¬ 
lated,  in  the  noisleless  case,  as  an  extremal  additive 
set  problem,  with  disjunction  as  addition.  (For  the 
basic  formulation  and  a  basic  bibliography  of  such 
problems  see,  e.g.,  a  survey  by  Sos  [1989].) 

It  can  be  expected  already  from  the  aforemen¬ 
tioned  studies  that  while  the  additive  set  models 
of  our  interest  for  Tasks  A  and  B  (time  as  well 
as  frequency  hopping,  slot  synchronous  as  well  as 


slot  asynchronous  arrivals)  are  under  general  cir¬ 
cumstances  essentially  distinct,  one  may  get  close  to 
the  very  same  capacity  limit  1/e  under  loose  (how¬ 
ever  more  or  less  individually  tailored)  conditions. 
One  of  our  purposes,  in  this  paper,  is  to  prove  that 
this  is  really  the  case.  Another  aim  is  to  show  what 
a  peculiar  interrelation  holds  among  the  additive 
families  underlying  the  various  distinct  versions  of 
the  multiple  access  problems  considered;  and  try 
to  point  out,  also  qualitatively,  under  what  kind  of 
constraints  (on  the  multi-user  objectives)  can  tlie 
capacity  be  kept  close  to  1/e. 

Transparent  upper  and  lower  bounds  are  given  on 
C  (resp.,  C(f)),  for  the  aforementioned  eight  prob¬ 
lems,  under  distinct  constraints  on  M,  in  and 
(and  also  on  T,  for  Task  A),  w.r.t.  z.  Both  quanti¬ 
ties  are  close  to  1/e,  for  an  approriately  large  value 
of  z  (e.g.,  z  >  100  bits),  and  approach  to  1/e  (with 
e  — »  0  for  C(f ))  as  z  — ►  oo. 

As  a  matter  of  fact  the  capacity  (resp.,  e- 
capacity)  per  channel  may  be  kept  close  to  the  limit 
1/e  for  very  distinct  clas.ses  of  the  considered  joint 
separation  and  transmission  problems  if  the  objec¬ 
tive  is  an  essential  point-to-point  message  transmis¬ 
sion  together  with  joint  frame  separation  (and  far 
not  just  separation).  This  feature  is  consistent  with 
the  fact  that  the  families  of  the  extremal  sets  cor¬ 
responding  to  the  aforementioned  eight  distinct  e.\- 
tremal  set  problems  (with  points  defined  by  a  trade¬ 
off  between  the  resources  for  joint  frame  separation 
and  point-to-point-to-point  transmi.ssion,  and  with 
a  shortest  hop  sequence  of  length  N')  have  got  a 
peculiar  joint  structure,  resembling  to  the  corolla 
of  a  petaled  flower.  Viz.,  the  capacity  (resp.  f- 
capacity)  sets  for  these  eight  distinct  problems  have 
got  a  common  limit  point  1/e;  C  — ►  1/c  (resp. 
C(()  — ►  1/c  with  c  — >■  0)  as  z  — ►  oo. 

The  constraints  for  Tasks  A  as  well  as  B  are  met, 
e.g.,  by  the  RS  hop  sequence  constructions  due  to  A, 
Gydrfi,  and  Massey  [1992]  of  actual  interest.  Thus 
one  may  get  by  zMe/N  a  good  idea  about  the  abso¬ 
lute  efficiency  rj  —  N'/N  of  such  (and  also  of  other) 
well  implementable  constructions. 


319 
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SUMMARY 

Recent  advances  in  optical  technology  have  made  it  possible 
to  transmit  at  very  high  data  rates.  Consequently,  the  propaga¬ 
tion  delay  for  a  packet  of  information  is  long  compared  to  the 
length  of  a  packet.  For  example,  consider  a  wide  area  network 
with  a  single  star  topology,  such  that  the  stations  are  located 
50  kilometers  from  the  hub,  packets  are  1000  bits  long  and  the 
transmission  rate  is  a  gigabit  per  second.  The  propagation  delay 
from  one  station  to  another  is  dictated  by  the  speed  of  light  in 
glass.  The  delay  is  about  500  microseconds,  roughly  500  times  as 
long  as  the  transmission  time  of  one  packet.  In  contrast,  classic 
protocols  such  as  the  ALOHA  protocol  were  investigated  with  a 
propagation  delay  roughly  12  times  the  packet  length  in  mind. 

The  particular  model  discussed  in  this  paper  is  now  described. 
Newly  generated  packets  arrive  according  to  a  Poisson  process 
with  rate  A.  Time  is  divided  into  slots  of  unit  length,  where  time 
is  normalized  so  that  one  packet  can  be  transmitted  in  one  slot. 
We  denote  by  slot  i  the  time  interval  [i,  i  +  1).  Those  packets 
with  generation  times  in  the  set  Bi  are  transmitted  during  slot 
I.  We  require  that  Bi  C  [0,  i),  for  a  packet  can’t  be  transmitted 
until  the  first  full  slot  after  it  arrives.  The  outcome  of  slot  i, 
denoted  by  9i  —  9{Bi),  satisfies  0i  €  {0, 1,2}.  If  no  packets  are 
transmitted  in  slot  t,  then  9i  =  0.  If  one  packet  is  transmitted  in 
slot  i,  then  the  packet  transmission  will  be  successful  and  =  1. 
If  two  or  more  packets  are  transmitted  in  slot  i,  then  the  packets 
will  collide  and  the  transmission  will  not  be  successful. 

There  are  two,  often  the  same,  propagation  delays  associated 
with  the  model — the  propagation  delay  of  feedback  and  the  prop¬ 
agation  delay  in  the  forward  channel.  The  propagation  delay  of 
feedback  is  denoted  by  the  positive  integer  N.  The  outcome  9i 
is  assumed  to  be  announced  to  all  stations  by  time  i  +  N.  Thus, 
we  require  fli+w  to  be  a  function  of  ■■■  ,9i)-  The  usual 

model,  in  which  the  outcome  of  slot  i  is  known  by  the  beginning 
of  slot  *  -(-  1,  corresponds  to  N  =  1.  We  define  the  transmission 
delay  of  a  packet  to  be  the  number  of  whole  slots  that  elapse  be¬ 
tween  the  time  the  packet  is  generated  until  the  beginning  of  the 
slot  in  which  the  packet  is  first  successfully  transmitted.  With 
this  definition,  the  delay  is  a  nonnegative  integer  value,  and  it 
does  not  include  the  forward  propagation  delay. 

By  finding  a  lower  bound  on  the  probability  that  a  typical 
packet  will  be  successfully  transmitted  within  the  first  N/2  slots, 
we  also  find  a  lower  bound  on  the  mean  delay  suffered  by  a  typical 
packet. 
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Proposition  1  Under  any  random  access  algorithm, 

>(.618)^,  (1) 

where  P(^)  is  probability  that  a  typical  packet  suffers  delay  y 
more.  In  particular,  the  mean  delay  suffered  by  a  typical  packet 
is  at  least  0.5J\r(.618)^ . 

An  upper  bound  on  the  achievable  mean  delay  of  a  typical 
packet  is  obtained  by  considering  specific  random  access  algo¬ 
rithms.  Given  ib  >  1  let  An>kx(k)  =  max{Gll  —  (1  — exp(— fcG))*] : 
G  >  0}.  Given  A  with  0  <  A  <  An>u(k),  let  Go  be  the  minimum 
positive  solution  to  the  equation 

A  =  G41  -  (1  -  exp(-jbGo))*].  (2) 

Finally,  let  70  =  (1  -  exp(-kGo))*  and  do(k,X)  =  7o/(l  —  7o)- 

Proposition  2  There  exists  a  family  of  random  access  algo¬ 
rithms  parameterized  by  Ik,  A,  N  so  that  if  D{k,  A,  N)  is  the  aver¬ 
age  delay  (exclusive  of  the  forward  propagation  delay)  of  a  typical 
packet,  then  limAr  —  00  D{k,\,N)/N  =^do(k,X). 


Table  1:  Comparison  of  lower  and  upper  bound  on  the  co- 
eficient  of  N  in  the  mean  delay,  for  some  values  of  A  and 
optimal  values  of  k. 


A 

lower  bound 
0.5(2-*" 

upper  bound 

do{k\X) 

0.05 

0.0000336 

16 

0.0000716 

0.10 

0.00410 

7 

0.00861 

0.15 

0.0203 

5 

0.0506 

0.20 

0.0452 

3 

0.138 

0.25 

0.0731 

2 

0.293 

0.30 

0.101 

1 

0.631 

0.35 

0.127 

1 

1.05 
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CONSTRUCTIONS  OF  PROTOCOL  SEQUENCES  FOR 
MULTIPLE  ACCESS  COLLISION  CHANNEL 
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A,  Gyorfi  and  Massey  [1]  have  given  a  general  way 
to  construct  constant-weight  cyclically  permutable 
codes.  A  cyclically  permutable  code  CPC(A,T,d«) 
is  a  binary  block  code  with  block  length  N,  size 
T  and  positive  cyclic  minimum  distance  d^.  The 
cyclic  minimum  distance  d*  of  a  code  is  defined  as 
the  minimum  Hamming  distance  from  a  codeword 
to  its  own  cyclic  shifts  or  to  some  cyclic  shift  of 
another  codeword. 

Let  a  be  a  primitive  element  of  GF^p'),  where 
p  is  a  prime  number  and  r  >  1.  A  primi¬ 
tive  BCH  code  V  of  length  n  =  p*"  —  1  is  then 
defined  by  the  parity-check  polynomial  h(x)  = 
l.c.m.{Mo(a:),Mi(a!),...,A/i_i(a;)},  where  Mj(z) 
is  the  minimal  polynomial  of  o^  over  GF(p),  and 
3  <  ib  <  p  —  1,  j  =  0, 1, . . . ,  ib  —  1.  K  is  given  by  the 
direct  sum 

V  =  Vo  +  Vi  +  V2  +  ...  +  Vk.i, 

where  Vj  is  the  code  over  GF{p)  of  length  n  with 
parity  check  polynomial  Mj  (r),  j  =  0, 1, . . . ,  It  —  1. 
Because  Mi{x)  is  primitive  polynomial,  Vi  contains 
an  m-sequence  c* . 

Consider  the  following  subcode  of  V : 

V  =  {c^}  +  V2  +  ...  +  Vk.i. 


If  the  pulse-position-modulation  (PPM)  code  con¬ 
sists  of  ail  weight-one  sequences  of  length  p,  then 
let  B*  be  the  cyclic  concatenation  of  V  and  the 
PPM  code,  defined  in  [1].  It  is  shown  that  fl*  is  a 
binary  constant-weight  cyclically-permutable  code 
with  length  p(p’’  -  1),  size  p(*~2)r^  cyclic  minimum 
distance  dg  >  2(p’'  —  1  —  (k  —  l)p'~^). 

The  set  {si,  S2, . . . ,  sr}  of  binary  sequences  is 
said  to  be  a  (T,  M,  N,  tr)  protocol  sequence  set 
if  these  sequences  all  have  length  N  and,  when 
used  as  protocol  sequences  for  multiple  access  col¬ 
lision  channel  without  feedback,  have  the  property 
that  each  active  user  can  be  identified  by  the  re¬ 
ceiver,  the  receiver  can  synchronize  and  each  ac¬ 
tive  user  achieves  at  least  <r  successful  packet  trans¬ 
missions  during  the  protocol  sequence  length,  pro¬ 
vided  that  at  most  M  out  of  the  T  users  are 
active.  For  any  integer  <t  with  1  <  <  id, 

‘Technical  University  of  Budapest,  Stoczek  u.  2,  H-1521 
Budapest,  Hungary 


a  binary  constant-weight— u;  cyclically-permutable 
code  Ci*C{N,T,dc)  is  a  {T,M ,N ,<r)  protocol- 
sequence  set  for 


M  —  min 


to-  1 
w  -  de/2 


w  —  d 
w  -  de/2 


where  [.J  denotes  rounding  down  to  the  nearest  in¬ 
teger  ([!]). 

If  the  total  inform8ttion  transmission  rate  R,um  is 
defined  by 


R 


$um 


Mff 

N 


(packets/ slots). 


and  the  code  B*  is  used  as  a  (T,  M,  N,  tr)  protocol 
sequence  set  then  the  parameters  are  as  follows: 

T  =  p(*-2)^  A  =  p(p^-1), 


M  > 


(ib  -  l)p^-> ’ 


a(w  —  a) 
Nik-l)p^-^’ 


the  maximum  of  which  is  obtained  for  a  =  w/2  un¬ 
der  the  condition  w/2  —  l>(k  —  Op"""* .  Choosing 
a  =  w/2,  we  get 


for  large  p.  The  ratio  of  the  total  population  T  to 
the  block  length  N  is 


lAf  -  !)■ 

For  k  =  3,  this  ratio  is  w  p“*.  For  fixed  ib  >  3,  this 
ratio  is  a  monotone  increasing  function  of  r  and  is 

^p(*-3)r-l 

[1]  N.  Q.  A,  L.  Gyorfi,  J.  L.  Massey  "Construc¬ 
tions  of  binary  constant- weight  cyclic  codes  and 
cyclically  permutable  codes”  IEEE  Trans,  on  In¬ 
formation  Theory,  vol.  38,  pp.  490-499,  May  1992 
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ABSIBACI 

In  this  paper,  certain  generalizations  on  the  collision 
channel  without  feedback  are  presented,  bas^  on  the 
original  work  of  Massey  and  Mathys.  We  are 
co^med  with  situations  in  which,  given  a  Mllision  in 
a  slot,  the  channel  capacity  is  a  non-zero,  decreasing 
function  of  the  number  of  users  involved.  This 
corresponds,  for  example,  to  spread-spectruirj  type  of 
signaling.  Due  to  this  model,  concatenated  coding 
si^emes  are  employed  to  efficiently  exploit  the  timi^ 
varying  nature  of  the  channel,  upon  using  reliability 
information  from  the  inner  code  decoding  process,  at 
the  outer  code  decoding  process.  Results  concerning 
capacity  and  coding/decoding  tradeoffs  will  be 
presented. 


SUMMARY 

The  collision  channel  without  feedback,  was 
thoroughly  investigated  by  Massey  and  Mathys  in  n  j. 
Massey  and  Mathys  showed  that,  the  asymptotic 
throughput  of  this  channel,  as  the  riumber  of  users 
tends  to  infinity,  approaches  e  ''  -1,  independently  of 
the  kind  of  network  operation  (synchronous  or 
asynchronous).  In  their  constructive  proof,  (1J.  they 
presented  specific  protocol  sequences,  and  rnwimum- 
erasure-burst-correcting  (MEBC)  codes  to  achieve  this 
maximum  throughput. 

In  this  paper,  we  attempt  certain  generalizations  on  (1  J. 
Our  main  scope  is  to  consider  systems  in  which, 
collisions  do  not,  in  general,  lead  to  a  totally  useless 
channel  of  zero  capacity.  In  this  Mse,  the  channel 
cdpacity  availabte  during  a  collision,  is  a  function 
number  of  interfering  users  during  the  rojlision  This 
model  applies,  for  example,  in  Code  Division  Multipje 
Access  (CDMA)  systems,  and  in  capture  systems,  in 
which,  there  are  power  variations  in  the  received 
signal  powers  of  different  users. 

We  show  that,  the  protocol  sequence  construction 
presented  in  (1),  can  be  adopted  effectively  by  our 
model,  to  provide  efficient  channel  accessing, 
independently  of  time  shifts  between  users  signajs. 
Due  to  the  time-varying,  non-zero  channel  capacity 
during  a  collision,  more  complex  coding  schemes,  like 
for  example,  concatenated  schemes  with  generalize 
minimum  distance  decoding  [4]  will  be  considered. 
This  is  dictate  by  the  fact  that  packets  are  not  fully 
d^roye  in  a  collision.  Thus,  an  inner  code  will  output 
relieimy  Information  for  deceing  the  (outer  code) 
superpackets.  In  fact,  due  to  the  'softer*  collisions  in 
our  case,  the  overall  ceing/deceing  problem 
becomes  more  complex. 


322 


We  will  study  three  scenarios  suitable  for  application  of 
the  above  mentioned  model;  Frequency-Hopping 

Spread-Spectrum  (FH/SS),  Direct-Sequence  Spread- 
Spectrum  (DS/SS),  and  unspread  signaling  in  which, 
power  variations  in  the  arriving  signals  anse  due  to 
path  loss  and  fading. 

Finally,  because  the  code  alphabet  size  required  for 
achieving  capacity  in  this  generalized  version  of 
collision  channel  is  very  high,  we  examine  a  potential 
application  of  binary  codes  together  with  interleaving 
15],  as  a  means  of  achieving  'realistic*  but  non¬ 
optimum  throughput  in  the  channel. 
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Abstract 

The  problem  of  dengning  codes  for  M  users  that 
permit  any  T  <  M  users  to  trannnit  at  the  same 
time  is  investigated  for  the  collision  channel.  Twelve 
communication  problems  are  considered  that  vary  ac¬ 
cording  to  the  degree  of  synchronisation  among  users, 
the  recdver’s  knowledge  of  the  active  users,  and  the 
deured  reUalnlity  of  the  code.  For  each  problem,  the 
T-of-J/  user  capacity  region  is  determined  and  con¬ 
structive  coding  schemes  that  approach  any  rate  in 
this  re^n  are  presented.  Applications  to  random 
access  communications  are  discussed. 

Summary 

The  eolKnon  channel  without  feedback  models 
the  communication  of  many  transnutters  with  a  com¬ 
mon  receiver  through  a  shared  packet  broadcasting 
channel.  Massey  and  Mathys  [1]  determined  the  T- 
user  capacity  reipons  of  this  channel  for  asynchronous 
and  slot-synchronous  users,  and  also  gave  construc¬ 
tive  codes  that  approach  all  rates  in  these  regions. 

This  paper  considers  the  dengn  of  codes  for  M 
users  that  permit  any  subcollection  of  up  to  T  of  the 
M  to  transmit  at  the  same  time.  Twelve  commu¬ 
nication  problems  are  posed  that  vary  according  to 
whether  the  users  are  tjnchnnone,  eloUepiehronona, 
or  oeynehrononr,  the  active  users  are  known  or  an- 
known  in  advance  to  the  receiver,  and  the  error  is 
desired  to  be  sero-error  or  arhUrarilf  email  error. 

Theorem  It  For  the  T-of-M  user  coUiuon  chan¬ 
nel,  all  slot-synchronous  and  asynchronous  capacity 
regions  coincide.  This  common  c^wdty  region  con- 
nsts  of  all  R  =  (Hi, . . . ,  Ru)  such  that 

for  some  q  =  (41, . . . ,  9ju)  satisfying  0  <  «<  <  1,  4[ij  + 
” '  +  4(r]  =  1>  9[t]  =  9[r]  for  T  <  k  <  M ,  and  where 

u  qt  arranged  in  decreasing  order.  O 

Remark:  The  slot-sjmchronous,  known  user,  capadty 
re^on  was  obtiuned  earlier  in  [2]. 

For  the  T-of-M  capacity  te^on  H,  the  symmetric 
capacity  is  0,,^  =  sup{  r  :  (r/T, . ..,r/T)  €  *  >  . 

Supported  ia  pert  bjr  ARO  Great  DAAL03-89-K-0130. 


Corollary!  The  T-of-M  user  symmetric  capadty  is 

Ctfm.  =  (  1  -  1/r  packets/slot, 

regardless  of  whether  the  users  are  slot-synchronous 
or  asynchronous  and  whether  or  not  the  active  users 
are  known  in  advance  to  the  receiver,  and  whether  for 
arbitrarily  small  error  or  sero-error.  O 

Since  C«ym  does  not  depend  on  Af ,  adding  users 
does  not  reduce  capadty. 

Theorem  2t  For  the  T-of-M  user  collinon  channel, 
all  synchronous  capadty  regions  coincide.  This  com¬ 
mon  capadty  region  is  the  set  of  all  R  =  (Ri, . . . ,  Ret) 
that  satisfy 

IU<  min  E{2r<(l-H^.)...(l-Z„..)} 

for  some  binary  random  variables  H|, . . . ,  Zu-  O 

Theorem  3t  In  the  synchronous  case,  the  T-oUM 
qrmmetric  capadty  is 

C’.jm  =  packeU/slot, 

where  K  =  f(M  +  1)/T]  —  1  ,  regardless  of  whether 
or  not  the  receiver  knows  in  advance  the  set  of  active 
users,  and  regardless  of  whether  small  error  or  sero 
error  is  denred.  O 

Remark:  Symmetric,  uncoded  TDMA  achieves  a 

symmetric  rate  of  T/M.  This  is  optimal  if  and  only 

ifr<M<2r. 

For  each  capadty  repon,  constructive  codes  that 
approach  all  rates  in  these  repons  are  pven.  Ap¬ 
plications  to  random-access  communicatimis  will  be 
discussed. 
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In  this  paper  we  present  a  new  approximate  model  lor  the  analyw 
of  systems  of  interacting  queues  which  often  arise  in  multiple  access  network 
protocols.  This  new  model  is  a  refinement  of  an  existing  model  developed  in 

[1]  for  the  ALOHA  multiple  access  protocol.  We  begin  by  applying  this  itKxlel 
to  the  analysis  of  a  multiple-node  broadcast  algorithm  for  a  mesh  network, 
which  was  presented  in  [2],  We  then  show  how  our  model  can  be  used  to 
study  the  performance  of  the  ALOHA  multiple  access  protocol. 

A  multiple-node  broadcast  is  a  conunon  task  in  the  execution  of  par¬ 
allel  algorithms  in  a  network  of  processors,  where  every  processor  may  have 
a  message  to  be  broadcast  to  all  other  processors.  In  [2]  an  algorithm  was 
developed  which  performs  periodic,  synchronized,  broadcast  cycles,  where 
during  each  cycle  only  a  small  number  of  nodes  are  allowed  to  broadcast 
their  message.  Consider  an  N  by  N  mesh,  where  each  node  has  exogenous 
packets  arriving  (to  be  broadcast)  independently  according  to  a  Poisson  ran¬ 
dom  process  and  placed  in  infinite-capacity  queues.  Our  broadcast  algorithm 
works  as  follows;  We  partition  the  mesh  into  S  vertical  rings,  such  that  each 
node  belongs  to  exactly  one  ring.  At  the  begiiming  of  every  broadcast  cycle 
each  ring  selects,  at  random,  up  to  d  packets  to  be  broadcast  throughout  the 
mesh.  The  broadcast  of  the  d  packets  from  each  ring  is  performed  and  has 
a  fixed  duration  of  (d  -f  l)(Af  —  1)  time  slots.  Clearly,  the  queues  at  the  N 
nodes  on  each  ring  ate  highly  dependent  on  each  other.  In  fact,  the  queue 
sizes  of  the  N  nodes  on  each  ring  form  an  N-dimensional  infinite  Markov 
chain.  Obtaining  analytic  expressions  for  the  steady-state  behavior  of  such 
a  system  is  very  difficult.  Even  a  numerical  evaluation  of  such  systems  can 
be  computationly  prohibitive.  A  similar  difficulty  arises  in  the  analysis  of 
the  Aloha  multiple  access  protocol  and  no  exact  analysis  for  packet  delay  is 
known,  for  that  case  either.  Several  approximate  models  have  been  proposed 
for  the  analysis  of  ALOHA  which  may  be  useful  in  analyzing  our  system. 

In  [1],  Ephremides  and  Saadawi  developed  an  approximate  model  for 
a  system  of  interacting  queues  for  analyzing  the  ALOHA  protocol.  In  their 
model  they  approximate  a  system  of  N  infinite  queues  as  a  single  dimensional 
infinite  Markov  chain  representing  the  state  of  one  user  together  with  an  N- 
dimensional  finite  Markov  chain  representing  the  state  of  the  rest  of  the 
system.  They  use  parameters  from  the  solution  of  one  chrun  in  analyzing 
the  other  and  solve  the  two  chains  together  using  an  iterative  algorithm. 
This  two-chain  approach  tracks  the  interaction  between  the  different  users 
in  a  system  model  that  can  be  analyzed.  We  develop  a  similar  approximate 
model  for  the  system  of  interacting  queues  in  the  mesh  broadcast  case. 

One  Markov  chain  in  our  model,  termed  the  user  chain,  represents 
the  queue  size  for  a  single  user.  It  is,  therefore,  an  infinite  chain.  Packets 
arrive  according  to  a  Poisson  random  process  and  depart  only  when  this  node 
is  chosen  for  service.  We  denote  the  probability  that  this  node  is  chosen  for 
service  by  P,  and  show  that  the  delay,  D,  can  by  expressed  as 

5  AS(2-AS) 

2^  2(P.  -  AS) 

where  S  is  the  cycle  duration  which  ‘s  equal  to  (d-H)(N-l).  The  missing 
ingredient  in  this  expression.  Pi,  is  the  one  term  that  can  be  obtained  from 
the  other  chain  in  our  model,  termed  the  system  chain. 

The  system  chain  represents  the  number  of  non-emntv  nodes  on  (me 
ring  (the  ring  containing  our  node  of  interest).  Clearly,  this  chain  consists 
of  N-t-1  states.  The  transition  probabilities  between  these  states  can  be  ex¬ 
pressed  in  terms  of  parameters  from  the  user’s  chain.  If  Si  denotes  the 
state  of  the  system  with  i  non-empty  and  (N-i)  empty  nodes  and  if  Pi  denotes 
the  steady-state  probability  of  Si ,  then  P.  can  be  expressed  as 


Since  the  system  chain  equations  depend  on  parameters  from  the 
user’s  chain  and  visa  versa,  the  tsro  chains  are  solved  together  using  an  iter¬ 
ative  algorithm.  The  results  from  our  approximate  model  compare  very  well 
with  simulation,  particularly  when  arrival  rates  are  low. 

In  order  to  improve  the  accuracy  of  the  model,  we  expanded  the 
system  chain  to  include  the  identity  of  the  individual  queues  and  their  stv 
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tua  (empty  or  non-empty).  This  adjustment  to  the  system  model  proved  to 
dramatically  improve  the  performance  of  our  approximation.  However,  with 
this  change  the  system  chain  consists  of  2^  states  and  is  difficult  to  solve 
for  all  but  very  small  values  of  N.  To  overcome  this  shortcoming  of  the  ex¬ 
panded  model,  we  limited  the  system  chain  so  that  it  merely  represents  the 
dentity  and  state  of  one  user  (our  user  of  interest)  along  with  the  number 
of  non-empty  nodes  on  the  ring.  This  modification  permits  a  more  accurate 
derivation  of  the  probability  of  success  for  the  user  chain.  This  is  because  the 
probability  of  success  is  defined  to  be  the  probability  that  the  user  is  chosen 
to  be  served  given  that  it  is  non-empty.  Therefore,  when  the  system  chain 
contains  the  state  of  our  user,  we  can  compute  the  probability  of  success  by 
conditioning  on  the  user  state  being  non-empty.  It  turns  out  that  this  new 
model  is  just  as  accurate  as  the  previous  model  (containing  the  identities  of 
all  of  the  users)  but  since  this  new  chain  has  only  2(N-I-1)  states  it  is  much 
easier  to  analyze. 

Since  the  improved  model  offers  such  an  improvement  to  the  original 
model  with  a  minimal  additional  complexity,  we  were  motivated  to  develop  a 
nroilar  modification  for  the  ALOHA  multiple  access  protocol.  In  the  ALOHA 
case  we  consider  a  finite  number  of  users,  each  accepting  packets  that  arrive 
independently  according  to  a  Bernoulli  random  process,  competing  for  the 
use  of  a  single  channel.  If  a  terminal  is  empty  (has  no  packets),  a  newly 
arrived  packet  is  transmitted  immediately.  The  transmission  is  successful 
if  and  only  if  no  other  user  attempts  transmission  during  the  same  slot, 
otherwise  a  collision  occurs  and  the  terminal  enters  the  blocked  state.  When 
in  the  blocked  state,  the  terminal  attempts  re-transmission  with  probability 
p.  In  case  of  success  the  terminal  becomes  unblocked.  An  unblocked  terminal 
can  be  in  one  of  two  states;  idle  (when  its  queue  is  empty),  or  active  (when  its 
queue  is  not  empty).  An  active  terminal  transmits  a  packet  with  probability 
one. 

The  state  of  any  single  user  can  be  specified  by  its  queue  size  and 
by  the  indication  of  whether  it  is  in  the  blocked  or  active  states.  A  complete 
description  of  a  N-terminal  system  requires  the  analysis  of  a  2N-dimensional 
infinite  Markov  chain.  Again,  such  chains  are  known  to  be  very  difficult  to 
analyze.  We  therefore  resort  to  an  approximation. 

As  was  stated  earlier,  in  [1]  an  approximation  was  developed  which 
modeled  an  N-dimensional  infinite  Markov  chain  as  a  one-dimensional  infinite 
chain  representing  the  state  of  a  single  user  tn;;rther  with  a  N-dimensional 
finite  chain  representing  the  number  of  blocked  and  active  users  in  the  entire 
system.  In  [3]  an  improvement  to  the  above  model  was  proposed  which  ex¬ 
panded  the  system  chain  to  include  the  identity  of  all  N  users.  That  expanded 
model  was  shown  to  perform  far  better  than  the  model  in  [1];  however,  the 
expanded  system  chain  contained  3"  states  and  was  very  difficult  to  analyse 
for  all  but  very  small  values  of  N.  We  therefore  develop  a  new  system  chain, 
similar  to  the  one  developed  for  the  multiple-node  broadcast  algorithm,  which 
includes  the  state  of  only  one  user  together  with  the  number  of  active  and 
blocked  users  in  the  entire  system. 

Our  analysis  shows  that  this  refined  model  performs  very  well  at 
low  arrival  rates  and  offiets  an  improvement  over  the  original  model  in  which 
the  system  chain  contained  no  information  about  the  individual  terminals; 
however,  it  does  not  perform  as  srell  as  the  improved  model  which  contained 
the  identity  of  all  N  users.  The  differences  are  most  noticeable  when  the 
arrival  rates  are  high  (close  to  saturation). 
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The  Exhaustive  Cycle-Gated  (ECO)  Service  Scheme  is 
used  in  a  single-server  muldqueue  system.  It  works  as  follows: 
at  the  beginning  of  each  cycle,  when  the  server  teaches  queue  Qi. 
a  “poller”  is  dispatched  from  Qi.  This  poller  visits  queues  in  a 
sequential  order  Qi.Q2,Q3,...,Qn-  When  the  poller  reaches  a 
queue,  it  marks  all  the  customers  present  in  the  queue  for  service 
in  that  cycle.  The  poller  incurs  zero  delay  at  each  queue: 
however,  there  is  fixed  switch-over  time  from  one  queue  to  the 
next  The  server  serves  the  queue  in  the  same  ort^.  At  each 
queue  it  serves  all  the  marked  customers.  Meanwhile,  the 
arriving  customers  to  all  queues  have  to  wait  for  the  next  cycle 
to  receive  service.  When  the  server  returns  to  Qi.  a  new  cycle 
begins.  The  queues  are  of  infinite  length  and  customer  arrival 
processes  to  the  queues  are  nonsymmetric  indqtendent  Poisstm 
processes.  The  service  time  of  customers  has  a  general 
distribution  which  is  the  same  for  all  the  queues  (though  this  can 
be  generalized  to  different  service  distributions  at  stations). 

The  ECG  scheme  is  related  to  an  existing  service 
scheme,  the  Gated  Sequential  Service  (GSS)  scheme  [1],  which 
is  used  in  the  Fasnet  protocol  for  hi^-sp^  fiber  optic  bus 
Local  Area  Networks.  However,  in  the  GSS  scheme  only  the 
head-of-line  customer  present  at  each  queue  at  the  beginning  of  a 
cycle  is  served  during  that  cycle.  Hence,  in  our  terminology,  the 
service  scheme  is  a  1 -limited  cycle-gated  scheme.  The  ECG 
scheme  is  also  related  to  a  recently  proposed  and  analyzed 
scheme,  the  Globally  Gated  (GG)  service  scheme  [2].  In  the  GG 
scheme  there  is  a  global  clock  which  at  the  beginning  of  each 
cycle,  gates  the  customers  in  all  the  queues.  These  are  the 
customers  which  receive  service  when  the  server  arrives  to  the 
queue.  The  GO  scheme  is  impractical  for  high-speed  networks  as 
it  is  very  difficult  to  maintain  a  global  clock. 

We  analyze  the  ECG  scheme  and  derive  closed-form 
expressions  for  the  moment  generating  function,  mean  and 
variance  of  the  waiting  time  and  the  number  of  customers  served 
at  each  queue  at  steady  state.  For  the  analysis,  we  employ  a 
space-time  diagram  to  model  the  system  as  it  elegantly  captures 
the  2-dimcnsion8l  nature  of  the  problem.  From  the  space-time 
diagram,  the  waiting  time  of  each  customer  is  shown  to  be 
composed  of  three  components.  Equations  for  these  are  derived, 
which  in  turn  gives  the  Laplace  transform  of  the  waiting  time  of 
customers  at  individual  stations.  We  then  highlight  some 
properties  of  the  Exhaustive  Cycle-Gated  (ECG)  service  scheme. 
We  show  that  the  ECG  scheme  is  similar  to  win^w-gated  access 
schemes,  in  that  only  those  customers  are  served  in  a  cycle 
whose  arrival  time  falls  within  a  time  window  of  limited 
duration  which  is  the  same  for  ail  stations.  The  ECG  scheme 
also  leads  to  a  natural  prioritization  of  the  queues,  EWi  <  EW2  < 
...  <  EWff,  where  EW,-  is  the  mean  waiting  time  of  a  customer 
in  Qi.  Moreover,  under  general  nonsymmetric  load  distribution, 
the  ratio  of  the  mean  waiting  time  at  Q]>i(  to  that  at  Qi  is  less 
than  (l+3p),  i.e  EWn/EWi<(1+3p),  wherep  is  the  normalized 


load  on  the  system  (p<l).  Thus  the  maximum  unfairness  is 
bounded.  We  also  show  that  the  average  waiting  time  of 
customers  arriving  to  the  system  (averaged  over  all  the  queues)  is 
iixlependent  of  the  qpatial  distribution  of  the  load  in  the  system. 
The  number  of  customers  served  at  a  station  during  each  cycle  is 
proportional  to  the  arrival  rate  id  that  statiem. 

Comparing  the  ECG  scheme  to  the  exhaustive  polling 
disciple,  we  find  that  the  average  waiting  time  is  higher  for  the 
ECG  scheme.  However,  it  has  been  widely  accqtted  in  polling 
literature  that  the  exhaustive  service  discqrline  is  unfair  to  lijdttly 
loaded  queues  in  nonsymmetric  traffic  arrival  scenarios.  By 
considering  several  cases  for  low  switch-over  time  between 
queues,  we  show  that  the  ECG  scheme  is  nwre  fair  to  the  lightly 
loaded  stations  in  that  they  have  lower  mean  waiting  times  than 
in  the  exhaustive  service  ^scipline.  Extensive  numerical  results 
for  the  ECG  scheme  as  incorporated  into  Fasnet  ate  given  in  P). 
Simulation  results  included  therein  validate  our  analysis. 

An  extension  of  our  work  is  to  consider  a  variation  of 
the  ECG  service  scheme  in  which  the  polling  and  service  order 
reverses  from  one  cycle  to  the  next  For  this  service  scheme— the 
Exhaustive  Reversing  Cycle  Gated  (ERCG)  scheme — we  use 
similar  q>ace-time  modelling  techniques  for  the  analysis  of  the 
system.  Details  of  the  analysis  can  be  found  in  [4]. 

The  analysis  can  also  be  applied  to  models  for  internal 
mail  delivery  systems  [5].  In  these  models  a  clerk  picks  up,  sorts 
and  delivers  mail  to  a  dosed  locq)  of  offices.  The  mail  picked  up 
in  a  round  is  sorted  in  the  mailroom  and  delivered  in  the  next 
round.  The  marking  of  customers  (mail)  for  service  is  the  same 
as  in  the  ECG  scheme;  however,  the  server  collects  the 
customers  from  all  the  queues  and  serves  them  in  the  mailroom 
rather  than  serving  them  at  the  individual  queues  as  in  the  ECXi 
scheme. 
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1  Introduction 

In  this  paper  we  extend  an  existing  signal  estimation  method  [1] 
so  as  to  estimate  the  samples  of  a  s^pnent  of  a  stationary  random 
signal  embedded  in  noise  where  the  correlation  structure  of  the 
process  is  unknown.  The  method  assumes  little  prior  information 
and  can  be  applied  as  a  pre-processing  step  of  “cleaning  up”  the 
data. 

The  extension  of  the  method  is  based  on  the  idea  of  redudng 
the  rank  of  the  signal  model  in  order  to  lower  the  mean-squared 
estimation  error.  Rank  reduction  is  a  general  method  for  reducing 
the  complexity  in  linear  statistical  models  in  order  to  lower  the 
mean-squared  error  in  the  estimate  and  to  decrease  the  senativity 
to  measurement  errors.  By  uang  reduced-rank  models  we  lower 
the  variance  of  the  estimate  at  the  expense  of  introducing  bias  in 
the  estimate  and,  by  having  the  right  tradeoff  between  bias  and 
variance,  the  overall  mean-squared  estimation  error  is  reduced.  In 
the  past,  this  idea  of  using  rank-reduced  models  to  obtain  biased 
estimates  with  lower  mean-squared  errors  has  been  explored  for 
modeling  stationary  ugnals  with  known  correlation  structure  [2], 
and  for  solving  linear  least  squares  problems  [3].  In  this  paper 
we  analyze  the  use  of  a  reduced-rank  signal  model  in  the  context 
of  bias/variance  tradeoff  for  signal  vector  estimation  when  the 
correlation  structure  of  the  signal  is  unknown. 

2  Signal  Estimation  Method 

Consider  an  observed  data  vector  y(£xi)  containing  a  signal  com¬ 
ponent  y  and  a  noise  component  to.  We  are  interested  in  estimat¬ 
ing  the  signal  component  y  from  the  observed  data  vector  y.  The 
correlation  structure  of  the  signal  y  and  the  noise  to  is  unknown. 

y  =  y  +  w  (1) 

The  first  step  of  the  method  is  to  form  a  Toeplitz  data  matrix 
Ymxn  frcni  the  observed  data  vector  y.  The  formed  data  ma¬ 
trix  Y  can  also  be  represented  as  a  stun  of  Y,  the  rignal-only 
Toeplitz  matrix  and  W  the  noise  Toeplitz  matrix. 

Y  =  Y-^W  (2) 

In  the  second  step  of  the  algorithm  the  Singular  Value  Decompo¬ 
sition  (SVD)  of  the  Toeplitz  data  matrix  Y  is  formed  as  follows 

Y  =  U.il.Vf -hU.£,VW  (3) 

=  Yr  +  Yo,  (4) 

From  the  rank  r  ^tproodmation  Yr,  the  rignal  component  is  re¬ 
covered.  The  residual  matrix  Y,  contains  vestiges  of  the  signal 
component  which  are  traded  off  in  order  to  reduce  the  overall 
mean  squared  estimation  error. 

In  the  third  and  the  final  step  of  the  algorithm  each  element  of 
the  estimated  signal  vector  ff  is  obtained  by  a  linear  combination 
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of  the  elements  of  the  matrix  Y,.  This  step  of  signal  recovery  can 
be  represented  as  a  matrix  filtering  operation 

V  =  AFr,  (5) 

where  y^r  is  a  column  vector  with  mrt  elements  obtained  by  con¬ 
catenation  of  the  columns  of  Yr.  The  matrix  A  consists  of  filter 
weights.  See  [4,  5]  for  details. 

In  this  paper  we  assume  that  an  appropriate  effective  rank  for 
the  signal  model  has  been  determined  from  the  data  or  from  prior 
knowledge.  An  example  of  determining  the  effective  rank  directly 
from  the  data  is  presented  in  [6]. 

3  Applications 

The  generalization  of  the  method  can  be  readily  used  in  the  iq>pli- 
cations  of  the  oripnal  method,  e.g.  (1)  Adaptive  detection  [7,  6] 
where  one  can  temporarily  treat  strong  interference  as  a  signal  to 
be  estimated  and  then  subtracted  from  the  data  and  (2)  Datsi- 
adaptive  improvement  of  SNR  as  a  pre-processing  step  for  esti¬ 
mating  the  values  of  signal  parameters.  The  performance  of  ap¬ 
proximate  maximum  likelihood  estimation  of  agnal  parameters 
can  be  improved  by  first  estimating  the  waveform  of  the  signal 
component  [8]. 
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Abstract:  The  blind  equalization  problem  for  the  multi-chauinel 
data  transmission  is  investigated.  Tne  algorithm  is  based  on  the 
principle  of  distribution  matching,  i.e.,  the  total  system  must  be 
transparent  when  both  the  transmitted  signal  and  the  equalizer 
output  depend  on  the  same  distribution,  where  the  transmitted 
signal  is  assumed  to  be  IID  sequence.  The  difficulty  in  its  exten- 
tion  toward  multi-dimensional  cases  is  to  reduce  simultzineously 
cross-interference  between  ch^umel8  and  inter-symbd  interference 
in  each  channel.  The  cost  function,  which  measures  the  distzuice 
between  the  joint-distribution  of  equalizer  output  vectorfxjk)  and 
that  of  transmitted  vector  (oi,),  is  able  to  s<Jve  the  difficmty.  The 
proposed  algorithm  is  closely  related  to  the  minimum  entropy  de- 
convolution  (MED),  whose  cost  function  measures  the  distance 
from  the  Gaussian  to  the  distribution  of  equalizer  output.  By  ex¬ 
tending  kurtosis  used  in  MED  theory  to  multi-dimensional  cases, 
we  derive  another  cost  function  which  appears  to  be  equivalent  to 
the  first  proposed  cost  function  excepting  power  normalization. 

In  multi-dimensional  cases,  the  transmitted  signal  ai,,  the  re¬ 
ceived  signal  t/jL  and  the  equalizer  output  zi,  are  series  of  vector, 
and  the  chzumu  response  Hk  and  the  equalizer  Wi,  2ue  series  of 
matrix.  For  example,  the  equalization  in  multi-carrier  data  trans¬ 
mission  is  such  a  typical  model.  In  these  cases,  should  be 
assumed  to  be  multi-dimensional  IID(independent  identically  dis¬ 
tributed),  i.e.,  independent  both  in  the  time  series  of  e2M:h  element 
of  the  vector  and  among  elements.  Therefore,  the  equalizer  must 
eliminate  not  only  intersymbol  interference  in  each  channel  but 
also  cross-interference  between  ch2mnels.  Letting  be  the  matrix 
series  of  total  response,  our  destination  is  written  as  To  =  /  (unit 
matrix),  T^  =  0  (zero  matrix)  (k  ^  0).  Two  types  of  algorithms 
for  blind  equalization  have  been  reported.  One  is  based  on  the 
strategy  to  force  the  probability  distribution  of  equalizer  output 
to  that  of  the  transmitted  signal  The  other  aims  to  elimi¬ 
nate  the  time-dependency  in  equalizer  output  .  In  this  paper,  we 
extend  the  first  ^gorithm  toward  multi-dimensional  cases.  In  one¬ 
dimensional  cases,  we  have  a  cost  function  as  E((z*  —  7sign(zt))*), 
where  sign(-)  is  signum  function  and  7  is  constant  veJue  given  by 
7  =  E[a2|/E[|ak|].  It  is  shown  that  this  cost  function  measures  the 
distance  tUtween  the  distribution  of  ai,  wd  Zk,  and  to  minimize 
it  makes  the  total  system  transparent.  An  extension  to  multi- 
dimensionaJ  cases  is  easily  given  by  rewriting  z*  in  vector  form  z*. 
However,  this  can  not  remove  the  inter-channel  interferences.  For 
instance,  such  a  significant  problem  occurs,  as  plural  channels  de¬ 
generate  into  single  channel  ud  some  of  channels  are  dropped-out. 
To  solve  this,  we  propose  a  new  cost  function  as 

J,  =  AE[(||z*||^  -  ^)’]-t-  BI:E[((z<'>)»  -  ^)’],  (1) 

where  ||  ■  ||(=  ^z^z*)  is  Euclidean  norm,  z^'*  is  the  i-th  elememt 

of  z»,  7?  =  E[||o»in/E[||a*|l^],  7j  =  E[(a);^»)«]/E[(a<'>)^],  and  A 
and  B  is  positive  constant  parameters.  To  mimimize  the  first 
term  makes  the  ioint-distribution  of  Z),  in  the  same  shape  as  that 
of  ai,.  But  ambiguity  of  ubitrary  amount  of  rotation  remains 
since  the  first  term  refers  only  to  the  information  of  radius  of  z^. 
The  second  term  contributes  to  adjustment  of  the  rotation  in  the 

desired  direction  so  that  each  pair  of  elements  zi'^  and  has  the 
same  figure  of  distribution.  In  result,  when  the  ffistribution  of  Ok  is 
sub-Gaussian,  to  minimize  the  cost  function  Ji  guarantees  blind 
equalization  after  remains  the  several  ambiguities:  the  channel 
swapping,  the  polarity  and  time-shift  among  time  series  of  each 
element  of  Zi,-  It  should  be  noted  that  the  time-shift  ambiguity 
causes  the  troublesome  problem.  Consider  the  case  where  the  time 
series  of  some  elements  of  Zk  shift  in  different  ways.  Then,  it  can  be 
permitted  if  the  time  series  of  i-th  element  of  z*  is  detected  in  the 
time-shifted  series  ai+,„  of  its  own  channel  signal  However, 
our  algorithm  has  a  risk  of  channel  drop-out  as  the  time  serise  of 
some  elements  of  z*  are  detected  in  the  time-shifted  series 

■  ■  ■  oi  another  channel’s  time  series  a* One  of  possible  ways 


to  avoid  such  troubles  is  to  employ  a  more  general  and  large  scale 
of  jmnt-distribution  matching  including  a  number  of  time-shifted 
signals  ...,  z*_i,  z*,  Zt+i . 

The  proposed  method  is  closely  related  to  the  minimum  en¬ 
tropy  deconvrJution  (MED)  which  derive  equalizer  output  as  far 
as  possible  from  Gaussianity  1*1.  Note  that  the  distribution  of  the 
received  signal  ^proximates  monotonously  to  the  Gaussian  when 
the  distortion  of  the  chatmel  increases.  Therefore,  the  miminizar 
tion  of  the  distance  between  distributions  of  as  and  Zk  means  the 
maximization  of  the  distance  between  the  Gaussian  and  the  dis 
tribution  of  z/^.  Elxtending  the  MED  theory  to  multi-dimensional 
cases,  we  have  another  cost  function  using  the  kurtosis  such  as 


h  =  A 


N 


+  BE 

tsl 


g[(4‘V] 

E[(zW)^p' 


(2) 


Ji  is  similar  to  Jj  except  that  the  power  normalization  function 
does  not  work  due  to  non-dimensionality  of  J^.  can  be  ap- 
I^ied  to  the  super-Gaussian  case  of  a^,  if  the  second  term  is  max¬ 
imized  by  setting  B  <0.  Fig.l  and  2  shows  the  simulation  results 
of  on-line  algorithm  for  Ji  and  off-line  algorithm  for  J2  in  two- 
dimensional  case,  where  depends  on  the  uniform  distribution. 
For  the  illy  conditioned  channel  response,  if  Ji  and  J2  lack  the  first 
terms,  the  phenomenon  of  channel  drop-out  easily  occurs,  since  the 
distribution  of  each  element  of  Zk  is  merely  forced  to  that  of  ai,.  It 
is  seen  in  Fig.l  and  2  that  each  channel  is  successfully  separated. 
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Fig.l  Time  evolution  of  on-line  algorithm  for  7] 
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Summary 

This  paper  deals  with  the  following  problem  of  esti¬ 
mating  a  random  process  from  a  finite  number  of  observa¬ 
tions,  which  arises  in  statistical  communication  theory  and 
signal  processing  as  well  as  in  geology  (Journel  and  Hui- 
jbregts,  1978)  and  environmental  science  (Cbristakos,  1992). 

Suppose  a  random  process  A(t),  t  €  [0,1],  is  sampled 
at  a  finite  number  of  appropriately  designed  points.  On  the 
basis  of  these  observations,  we  want  to  estimate  the  values 
of  the  process  at  the  unsampled  points  and  we  measure  the 
performance  by  an  integrated  mean  square  error  (IMSE). 

The  process  can  be  modeled  as 

X(t)  =  m(t)  +  N(t),  t6[0,l]. 

Here  m(t)  is  the  nonrandom  large-scale  mean  structure  and 
we  consider 

(1)  the  case  where  m(t)  is  known  or,  equivalently  equals 
zero; 

(2)  the  semiparametric  (regression)  model  where  the  mean 
can  be  modeled  as  m(t)  =  A/i(f)  +  . . .  +  /9«/«(f),  where  the 
^j’s  are  unknown  coefficients  and  the  /j’s  are  known  (regres¬ 
sion)  functions;  and 

(3)  the  nonparametric  case  where  the  macroscopic  mean 
structure  m(t)  is  unknown. 

N(t)  is  the  small-scale  random  structure  which  models  the 
temporal  dependence  and  has  zero  mean  and  known  covari¬ 
ance  function  R(t,3)  =  SN(i)JV(3).  The  centered  process 
A(t)  is  assumed  to  have  no  quadratic  mean  derivative  and 
the  functions  m(t)  and  /,(<)  are  of  comparable  smoothness 
with  the  microscopic  purely  random  part  lV(t),  specifically, 
m(f)  and  /i{t)  are  of  the  form  /J  R{t,s)ii:{s)d3. 

There  are  three  findings. 

The  main  one  is  the  specification  of  simple  sampling  de¬ 
signs  which  are  asymptotically  optimal  as  the  sample  size 
increases  to  infinity.  This  is  done  for  a  variety  of  estimators. 
First  the  best  linear  unbiased  estimator  (BLUE)  of  A’(f)  is 


used  in  Cases  (1)  and  (2)  and  a  nearly  BLUE  in  the  non¬ 
parametric  Case  (3).  The  rate  of  convergence  to  zero  of  the 
IMSE  is  n-': 

lim  r.  IAiSE„  =  f'  ^dt, 
oo  Jo  h(t) 

where  c(t)  depends  on  the  covariance  function  H(t,s)  and 
h{t)  is  the  density  of  the  sampling  design. 

The  second  finding  is  that  2isymptotically  the  mean  has 
no  effect  on  the  overall  performance  and  can  therefore  be 
neglected.  This  quantifies  the  discussions  in  Journel  and 
Rossi  (1989)  and  Sacks  et  al.  (1989,  p.  415).  However,  an 
example  of  linear  regression  in  Wiener  noise  shows  that  the 
mean  function  may  cause  some  perturbation  on  the  optimal 
sampling  design  points. 

The  third  finding  is  that  the  very  simple  and  nonpara¬ 
metric  linear  interpolation  also  leads  to  an  asymptotically 
optimal  performance! 

If  the  centered  N(t)  has  exactly  k  (k  =  1,2,. . .)  quadratic 
mean  derivatives,  the  convergence  rate  of  the  IMSE  is  likely 
to  be  but  we  do  not  investigate  further  this  conjec¬ 

ture. 
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Summary 

The  classical  Shatnnon  sampling  theorem  has  many 
applications  and  generalizations.  FVom  wavelet  trans¬ 
form  point  of  view,  it  provides  the  sine  wavelets.  Re¬ 
cently  it  has  also  been  extended  to  general  wavelet  sub¬ 
spaces  by  Walter  [Ij.  It  says  that  if  a  signal  f(t)  is  in 
wavelet  subspace  Vj{4),  then 


t  £  [a,  b)  and  0  otherwise.  We  present  a  family  of  car¬ 
dinal  scaling  functions  which  are  generalizations  of  the 
Haar  scaling  function  X[o,i)(0- 

If  /(()  is  not  in  Vj{4>),  generally  (1)  is  not  true.  When 
/(<)  £  Vj+i{4>),  Walter  [1]  estimated  the  error  between 
f{t)  and 


m  =  (1) 

fl 


where  the  interpolant  x(t)  has  its  Fourier  transform 


X(w)  = 


En  +  2nir)  ’ 


(2) 


and  ^(u)  is  the  Fourier  transform  of  the  scaling  function 
and  +  2nir)  ^  0  for  any  real  w.  Aldroubi 

and  Unser  (2]  considered  the  case  where  x(t)  is  a  scaling 
function.  In  particular,  they  called  a  scaling  function 
satisfying 


n  =  0 

n  =  ±l,±2,..., 


(3) 


a  cardinal  scaling  function.  In  the  following,  we  call  the 
wavelets  generated  from  cardinal  scaling  functions  as 
cardinal  wavelets.  It  is  clear  that,  for  a  cardinal  scaling 
function  <b{t),  the  sampling  theorem  is 

m  =  -  n) .  V/(0  £  Vjid>) ,  (4) 


Since  ^(t)  is  a  cardinal  function  if  and  only  if  J2n  d- 
2nir)  =  1,  Walter’s  sampling  theorem  implies  that  (3) 
is  also  necessary  for  (4)  to  be  true.  This  concludes  the 
following  proposition. 

Proposition  1.  The  sampling  theroem  (4)  is  true  if 
and  only  if  d>{t)  is  a  cardinal  scaling  function.  □ 

In  this  research,  we  further  classify  cardinal  scaling 
functions  which  satisfy  (3)  and  prove  that  a  scaling  func¬ 
tion  ^(t)  with  compact  support  is  a  cardinal  scaling 
function  if  and  only  if  ^{t)  is  the  scaling  function  corre¬ 
sponding  to  the  Htiar  wavelets,  that  is,  ^{t)  =  X[o,i)(0> 
where  X(a,»)(0  i>  an  indicator  function  which  is  1  for 

*Thii  research  U  eupported  in  part  by  NSF  under  Grant  NCR- 
890S052. 


where  x(0  satisfies  (2).  In  this  research,  we  consider 
cardinal  wavelets,  that  is,  the  S2tmpling  theorem  (4).  We 
estimate  the  above  error  for  f(t)  which  are  not  necessar¬ 
ily  in  V>4.i(^),  when  ^(t)  is  a  cardinal  scaling  function. 

As  an  application  of  the  sampling  theorems  (1)  and 
(4),  efficient  computations  of  wavelet  series  transform 
(WST)  coefficients  from  uniform  samples  of  a  signal 
were  considered  by  researchers  [3-4],  In  particular,  if 
/(<)  satisfies  (1)*  (or  (4))  then  the  WST  coefficients 
of  f{t)  can  be  exactly  obtained  from  fin/2^)  by  using 
the  Shensa  algorithm  (or  the  Mallat  algorithm,  a  spe¬ 
cial  case  of  the  Shensa’s).  Since  the  Mallat  algorithm 
is  generally  simpler  than  the  Shensa’s  one,  one  might 
prefer  the  sampling  theorem  (4).  For  signals  which  are 
not  necessarily  in  wavelet  subspaces,  usually  error  oc¬ 
curs  when  one  uses  the  Mallat  algorithm  to  compute  the 
WST  coefficients.  In  this  research,  we  also  present  sev¬ 
eral  numerical  examples  to  compare  the  errors  when  the 
Haar  wavelets,  Daubechies  D4  and  Dg  wavelets  and  car¬ 
dinal  wavelets  are  used.  These  examples  indicate  that 
the  error  for  the  cardinal  wavelets  is  much  smaller  than 
the  ones  for  other  wavelets. 
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Abstract  -  In  general,  two  aegaencee  formed  iy  uniformlg  aam- 
pling  two  orthogonal  aignala  will  not  he  orthogonal.  This  paper  presents 
families  of  discrete  orihonormal  wavelet  hoses  for  0  that  are  obtained 
hf  sampling  of  certain  dgadic  orihonormal  wavelet  hoses  of  over  s 
honnded  freqaencg  hand. 


1.  INTRODUCTION 

Numerous  wavelet  bases  for  I’(R)  have  been  described  in  recent  math¬ 
ematical  and  engineering  literature.  Because  of  their  tractability  in 
applications,  dyadic  orthonormal  wavelet  bases  have  received  consid¬ 
erable  attention.  Use  of  such  bases  in  discrete-time  settings  generally 
involves  sampling  of  the  mother  wavelet  at  uniform  intervals  which  are 
power-of-two  multiples  of  a  fixed  interval  T.  This  raises  the  issue  that 
the  sample  sequences  obtained  from  orthogonal  time-scale  replicates  of 
the  mother  wavelet  may  not  be  orthogonal. 

This  paper  investigates  the  problem  of  obtaining  (discrete-time) 
dyadic  wavelet  bases  for  from  (continuous-time)  dyadic  wavelet  bases 
of  L^.  In  particular,  a  construction  of  S.  Mallat  is  extended  and  the 
Whittaker-Kotel’nikov-Shannon  (WKS)  sampling  theorem  is  applied  to 
obtain  a  family  of  discrete  ortbonormsl  wavelet  bases  for  fi. 


3.  BASIS  CONSTRUCTION 


In  the  frequency  domain,  define 


if  X  <  |u|  <  2t 
otherwise 


(1) 


and  for  m  e  N  and  n  e  Z,  define  W‘  „(u)  =  y"/’lV'’(2’"u)e-’’""  or 
M'm,n(0  =  2-”'I^W‘(t/2”  -  n).  Then  with  m  €  N  and  n  €  Z 
form  an  orthogonal  basis  of  Z,’[-x,  x].  Moreover,  each  Wf!,  „  is  ban- 
dlimited  to  [— x,  x]  and  hence  by  the  WKS  theorem  may  be  represented 
by  samples  {W«  „(fc)}ts*  or  {Wm, „(»)}»«. 

Theorem  1:  The  sample  sequences  {Wn,n(t)}tet  src  ortbonormsl; 
i.e.,  for  arbitrary  dyadic  dilation  indices  m  €  N  and  m'  €  N  and  integer 
time  shifts  n  6  Z  and  n'  €  Z, 


X;»V".,n(t)WA, ,„,(*)  = 

set 


0 

1 


if  (m,  n)  ^  (m',n') 
if  (m,  n)  =  (m',n') 


(2) 


Proof:  FVom  above,  {W,J(  „}  is  an  orthogonal  basis  of  and  {W^.n)  is 
generated  by  sampling  the  corresponding  analog  function  with  sampling 
interval  T  =  1.  Note  that  dyadic  dilation  index  m  is  restricted  to  be  a 
natural  number  and  the  time  shift  index  n  to  be  an  integer. 

Let  Wm.n  be  the  discrete  time  Fourier  transform  (DTFT)  of  the 
discrete  signal  Wm,n 

k 

The  relationship  between  ^  and  Wm,n 

W'«,n(u-)  =  X;»V*,„(u.-2xt)  (3) 

k 

Since  =  0  when  \ut\  >  #, 

lV„,„(«)  =  lV*,„(u.)  when  h-l  <  X  (4) 


By  Parseval’s  theorem 


53  «'•»."(*)  M'-' ,»■(*)  = 


-C 

2x7., 


From  equation  (4), 

» Win* ,n*) 


k 


Orthonormality  of  the  analog  functions  thus  implies  orthonor¬ 
mality  of  the  sample  sequences  {Wm.n}-  I 

Theorem  t:  The  sequences  {Wm,n(4)}icz  span  f*.  Thus  they  com¬ 
prise  an  orihonormal  basis  for  the  space  of  discrete-time  finite  energy 
signals. 

Proof:  For  any  discrete  signal  /  €  /*  whose  DTFT  /  is  periodic  with 
period  2x,  an  analog  signal  /*  can  be  constructed  by  inverse  Fourier 
transform  of  the  frequency  domain  function  defined  by 


/(«)  H<» 

*  '  '  \  0  otherwise 

The  function  /*  can  be  expressed  as  a  weighted  sum  of  the  orthogonal 
basis  elements  W^  „.  Thus 

m  n 

and 

/(*)  =  E  E  ^  ^ 

m  n 

Hence  {Wm.n}  spaas  I 

The  procedure  just  described  may  be  applied  to  other  frequency- 
domain  wavelet  basis  constructions  using  bandlimited  wavelets  to 
yield  other  discrete  orthogonal  bases  for  fi. 

Some  additional  properties  of  tbe  above  construction  are  listed  be¬ 
low  without  proof.  Let  /(i)  be  as  in  equation  (S)  and  define 


»(»)  =  EE*-.  and  1i(*)  =  EE'=™"‘^".»(*) 

m  n  m  n 


Then 

•  Linearitg:  \t  h  =  f  +  g  then  e„,n  =  Om.n  +  tm,n 

•  Consolafton;  If  h  =  f  *  g  (i.e.,  h(n)  =  J^j/(i)j(n  -  k))  then 
Cm,n  —  Ofn.n  *  hm,n  —  2"*^^  Of^.n-k  ’  hm.k 

•  Dgadtc  dilation.  W„,o{k)  =  v^W„+,,<,(2t) 
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1.  INTRODUCTION 

The  ability  to  reconatnict  a  complex-valued  signal  on  R  from  a  se¬ 
quence  of  siunple  values  {/(tn))net  is  desirable  in  a  variety  of  engi¬ 
neering  ^plications.  While  this  problem  is  ill-posed  in  general,  many 
reconstruction  formulas  of  the  form 

/(0  =  E /('-)»>"(')  (1) 

net 

have  been  obtained  for  various  restricted  classes  of  functions. 

It  was  observed  in  [1]  that  such  a  formula  for  reconstruction  of  func¬ 
tions  from  a  given  class  C  extends  directly  to  a  reconstruction  formula 
for  functions  formed  by  composition  of  any  /  €  C  with  an  invertible 
function  7  ;  K  — >  R.  Application  of  a  coordinate  transformation  such 
as  7  to  the  domain  of  a  signal  is  commonly  called  ‘^ime-warping”  in 
signal  processing  literature.  Consequently,  signals  of  this  type  have 
become  known  as  “time-warped”  signals. 

Among  the  most  important  formulas  of  the  type  (1)  are  connected 
with  reconstruction  of  bandlimited  signals;  i.e.,  functions  having  the 
form  ^ 

=  (2) 

where  /  €  fr^(R)  and  0  <  fl  <  00.  Motivated  by  their  reconstructability 
from  samples,  this  note  presents  some  comments  on  the  class  S  o  F  of 
time-warped  bandlimited  signals;  i.e.,  functions  of  the  form  / 07  with  / 
belonging  to  the  class  B  of  bandlimited  signals  and  7  :  R  — •  S  belonging 
to  a  class  F  of  continuous  and  invertible  warping  functions. 

2.  RESULTS 

The  perspective  of  Paley  and  Wiener  [3]  that  it  is  natural  to  consider 
bandlimited  functions  on  the  complex  domain  is  adopted  in  what  fol¬ 
lows.  It  thus  becomes  necessary  to  consider  warping  functions  on  C 
as  well.  Given  a  bandlimited  function  /  :  R  — •  C,  denote  by  F  the 
corresponding  entire  function  with  values  defined  by 

Similarly,  given  h  €  B,  denote  by  //  the  associated  entire  function. 
Define  ff  to  be  the  collection  of  all  continuous  functions  G  ;  C  — >  C 
with  restrictions  7  to  R  that  are  real-valued  and  bijective.  If  G  €  ^ 
then  the  corresponding  7  £  F  is  well  defined.  Thus,  given  bandlimited 
functions  F  and  H  on  the  complex  domain,  finding  eG  e  Q  such  that 
H  =  FoG  ensures  that  there  is  some  7  €  F  such  that  A  =  / o 7.  Given 
7  €  F  such  that  A  =  /  07,  however,  there  is  no  a  priori  guarantee  that 
any  G  €  C  exists  with  the  property  that  H  =  F  o  G.  In  this  sense, 
considering  complex  warping  functions  in  G  is  more  restrictive  than 
considering  real-valued  warping  functions  in  F. 

Tktortm  I:  If  /  E  S  is  not  identically  sero  and  G  €  C,  then  ff  = 
F  oG  is  bandlimited  if  and  only  if  G  is  affine. 

It  is  clear  that  H  =  F  oG  will  be  bandlimited  if  G  is  affine.  The 
proof  of  the  “only  if"  part  of  this  theorem  is  based  on  the  growth 
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properties  of  the  entire  functions  F  and  H.  Specifically,  it  relies  on  the 
following  results. 

Lemma;  Suppose  G  e  G,  f  ^  Ba  a  not  identically  sero,  and  H  = 
F  o  G  is  bandlimited,  then  G  is  entire. 

Theorem  t  (from  [4]):  If  f  and  G  are  entire  and  the  order  cf  FoG 
is  finite,  then  either  (i)  G  is  a  polynomial  and  the  (»der  of  is  finite, 
or  (ii)  G  is  a  non-polynomial  function  of  finite  order  and  the  order  of 
F  is  sero. 

Theorem  S  (based  on  results  from  [4]);  If  /  £  B  is  not  identically 
sero  and  G  is  a  polynomial  of  degree  n  >  1,  then  the  order  of  AT  =  FoG 
is  greater  than  one. 

The  proof  of  Theorem  1  proceeds  as  follows.  Assuming  H  is  ban¬ 
dlimited,  Theorem  1  establishes  G  is  entire.  Theorem  2  may  be  applied 
to  show  that  G  is  a  polynomial.  Theorem  3  implies  that  the  degree  of 
G  is  either  sero  or  one.  If  G  were  constant  then  H  would  be  constant. 
Since  A  €  L^,  it  cannot  be  constant  without  being  identically  sero. 
Thus  G  is  a  polynomial  of  degree  exactly  one;  i.e.,  G(r)  =  or  -A  6  with 
o  ^  0.  The  condition  that  7  is  real  valued  implies  that  a  and  b  are  real. 
Hence  7(1)  =  at  -I-  i  for  real  numbers  a  and  b  with  a  /  0. 

3.  DEMODULATION 

Earlier  work  [2]  has  established  that  B  oT  contains  all  bandlimited 
functions  and  many  nonbandlimited  functions,  but  not  ail  of  L’.  A 
remaining  issue  is  that  of  demoialation:  given  A  E  B  o  F,  can  it  be 
decomposed  into  a  bandlimited  function  /  and  a  bijective  mrmotone 
time  warping  function  7? 

If  A  E  B  o  d,  then  there  are  necessarily  many  ways  to  express  A  as 
a  composition  /  o  7-  Given  any  a  >  0,  for  example,  define  functions 
fl  and  71  by  /i(f)  =  /(of)  and  71(f)  =  Then  /,  £  B,  71  E  g, 

and  /i  o  7i  =  /  o  7  =  A.  This  kind  of  representational  ambiguity  can 
be  circumvented  by  stipulating  that  /  have  exactly  unit  bandwidth.  In 
this  case,  the  question  of  representational  ambiguity  may  be  addressed 

by  a  corollary  to  Theorem  1. 

Corollarp  [of  Theorem  1]:  Suppose  A  =  /1071  =  /i^TJ  fi  “f** 
/j  having  exactly  unit  bandwidth  and  7i,7j  £  (?.  Then  /i(l)  =  /a(f-A) 
and  7i(f)  =  7j(t)  +  b  for  some  real  constant  b  and  all  f  E  R. 

Rrferences 

[1]  J.J.  Clark,  M.R.  Palmer,  and  P.D.  Lawrence,  “A  transformation 
method  for  the  reconstruction  of  functimis  from  nonuniformly  spaced 
samples,”  IEEE  TVssssefions  on  Acouatieo,  Speech,  and  Signal  Pro- 
ceaainf,  vol.  ASSP-33(4),  pp.  1151-1165,  October  1985. 

[2]  D.  Cochran  and  J.J.  Clark,  “On  the  sampling  and  reconstruction 
of  time-warped  bandlimited  signals,”  Proceedinga  of  the  IEEE  Interna¬ 
tional  Conference  os  Aconatica,  Speech,  and  Signal  Proceaaing,  vol.  3, 
pp.  1539-1541,  April  1990. 

[3]  R.E.A.C.  Paley  and  N.  Wiener,  Fonrier  Tennaforma  is  fAe  Com- 
plez  Domain,  AMS  Colloquium  Publications,  vol.  XIX,  American 
Mathematical  Society,  1934. 

[4]  G.  P61ya,  “On  an  integral  function  of  an  integral  function”.  Jour¬ 
nal  of  the  London  Mathematical  Societg,  vol.  1,  pp.  12-15,  1926. 


331 


A  UNIFIED  PARTITIONING  AND  FOLDING  PROCEDURE 
FOR  SYSTOLIC  ALGORITHMS 

Flavio  Lorenzelli  and  Kung  Yao 
Electrical  Engineering  Department 
University  of  California,  Los  Angeles,  CA  90024-1594 


In  many  signal  processing  applications  related  to  estima¬ 
tion  or  Kalman  filtering  problems,  efiScient  techniques  such  as  QR 
decomposition,  recursive  least  squares  (RLS),  etc.,  are  widely  em¬ 
ployed.  Real-time  use  of  such  techniques  is  made  possible  through 
the  development  of  systolic  algorithms  (SAs)  and  their  map¬ 
ping  onto  modular  parallel  processor  arrays.  We  present  a  linear 
mapping  procedure  for  SAs  based  on  integral  matrix  theory  which 
includes  partitioning,  folding,  and  predefined  design  constraints. 
Both  partitioning  and  folding  introduce  useful  degrees  of  freedom 
in  the  design  of  the  final  array. 

Mapping  Procedure 

In  order  to  apply  the  integral  matrix  theory  on  SA  map¬ 
ping,  the  algorithm  must  be  described  in  geometrical  terms  as 
a  dependence  graph  (DG),  i.e.,  a  lattice  embedded  in  a  multi¬ 
dimensional  integral  space.  Here,  we  propose  a  mapping  proce¬ 
dure  for  SAs  which  can  include  partitioning  (locally  sequential- 
globally  parallel,  LSGP,  as  well  as  locally  parallel-globally  sequen¬ 
tial,  LPGS)  and  folding,  and  takes  into  account  predefined  design 
constraints.  To  partition  an  algorithm  means  to  break  it  into 
components  of  smaller  size.  These  components  can  be  physically 
executed  in  parallel  or  in  a  sequence.  Care  must  be  taken  to  satisfy 
the  constrmnts  dictated  by  the  partial  ordering  among  computar 
tions,  the  locality  of  the  data  flow,  etc..  By  folding  we  denote 
the  operation  of  displacing  sections  of  the  projected  graph,  in  or¬ 
der  to  obtain  a  desired  pattern.  This  kind  of  operation  is  usually 
highly  non-linear  at  the  physical  array  level,  and  is  normally  per¬ 
formed  in  a  heuristic  manner,  after  projection.  Classical  systolic 
mapping  procedures  are  based  on  linear  or  affine  transformations, 
which  on  the  one  hand  make  the  design  simple  and  manageable, 
by  using  well-understood  tools,  but  on  the  other  hand  limit  the 
range  of  manipulations  that  can  be  applied  to  the  original  algo¬ 
rithm.  Nonlinearity  can  be  introduced  in  the  procedure  in  order 
to  midie  efficient  use  of  computing  and  memory  elements.  Both 
partitioning  and  folding  can  be  included  in  a  unified  linear  map¬ 
ping  procedure.  Partitioning  can  be  included  as  a  natural  exten¬ 
sion  of  the  general  mapping  procedure,  whereas  folding  requires  a 
preliminary  transformation  of  the  DG.  The  idea  is  to  iurtificially 
increase  the  dimensionality  of  the  DG  of  the  original  algorithm, 
by  fragmenting  one  or  more  of  the  old  dimensions.  In  so  doing, 
folding  can  actually  become  possible,  without  having  to  introduce 
nonlinear  allocating  functions. 

As  mentioned  above,  the  algorithm  must  first  undergo  a 
number  of  refinements  (regularization,  single  assignment  form, 
etc.),  so  that  the  associated  DG  has  the  desired  properties  of  lo¬ 
cality  and  shift  invariance.  The  DG  Q  can  be  seen  as  an  integral 
lattice,  i.e.,  a  proper  convex  and  bounded  subset  of  Z”,  Z  being 
the  set  of  relative  integers.  Each  node  of  the  graph  represents  a 
computation,  whereas  the  directed  arcs  connecting  the  nodes  (de¬ 
pendences)  represent  the  dependence  relationships  between  com¬ 
putations.  The  dependence  matrix  D  collects  the  dependence 
vectors  A'  as  columns.  In  the  sequel  assume  for  simplicity’s  sake 
the  DG  to  be  parallelepipedal: 

Q  =  {(m,  . . . , in)  e  Z"|A  <  «■>  <  Ui,j  =  1, . . . ,  n}, 

for  given  lower  and  upper  limits  A<A,j  =  l,...,n. 

Suppose  we  want  to  project  the  n-dimensional  DG  onto  an 
(n  —  p)-dimensional  array.  The  p  projection  vectors  are  collected 


in  the  n  x  p  matrix  U-,  the  left  submatrix  of  the  uniroodular 
matrix  U  =  [f/_,f/+].  The  (n  —  p)  columns  of  the  matrix  E+,  the 
right  submatrix  of  E  =  [E_, E+J  =  U~^,  by  definition  orthogonal 
to  the  p  projection  vectors,  can  be  considered  as  a  valid  basis  for 
the  processor  space,  so  that  a  given  node  J  €  &  is  projected  onto 
the  point  J'  =  E^J  in  the  processor  space.  The  scheduling  is  also 
defined  by  p  vectors,  stacked  as  columns  of  the  matrix  A- .  (Again, 
the  matrix  A  =  [A_,A4.]  is  the  unimodular  extension  of  A-.) 
The  matrix  A_  must  be  chosen  in  such  a  way  as  to  maintain  the 
ordering  of  the  operations  (precedence  constraint).  Furthermore, 
no  two  computations  may  be  mapped  on  the  same  ceil  at  the 
same  time  (compatibility  constraint).  The  first  constraint  can  be 
formally  stated  as  foUows 

aIa  >o^,  iJaIa  >i,  va- 

The  compatibility  constraint  can  be  specified  only  when  a  schedul¬ 
ing  function  is  chosen.  A  natural  definition  is  the  affine  function 
(J)  =  -i-C(J),  where  the  components  of  the  p  x  1  vector  u 

and  the  affine  constant  ((■)  are  suitably  chosen.  The  compatibility 
constraint  requires  then  that 

i»(J,)  i)(Jj),  if  J, /J,  and  eJj,  =eJj,. 

These  equations  immediately  become  J|  s  J]  -f  U-k,  for  sonte 
p-vector  ifc  0, 

|/■^A!U-i^9^C(Jl)-C(J^).  MO. 

These  constraints,  in  addition  to  the  minimization  of  the  overall 
computation  time,  can  be  used  to  choose  the  right  values  for  u 
and  the  constants  ((■). 

The  matrix  M  =  E+T+  is  related  to  the  partitioning  of 
the  algorithm,  whenever  its  determinant  has  absolute  value  larger 
than  unity.  It  can  be  proven  that  the  columns  of  M  define  the 
shape  of  each  component  of  the  partitioned  algorithm  and  that 
its  determinant  is  related  to  the  number  of  nodes  constituting  the 
component  (its  “volume”).  This  mathematical  framework  also  al¬ 
lows  the  inclusion  of  design  constraints  in  the  mapping  procedure, 
dealing  with  the  interconnection  pattern,  I/O  port  location,  etc.. 
These  requirements  affect  the  choice  of  both  the  projection  matrix 
U  emd  the  scheduling  matrix  A. 

Folding  can  be  included  in  the  procedure  by  first  artificially 
increasing  the  dimensionality  of  the  DG.  One  way  to  achieve  this 
is  to  fragment  some  or  all  dimensions  of  the  original  DG  by,  e.g., 
limiting  the  number  of  nodes  along  any  given  coordinate  axis  to  a 
prespecified  amount  (say,  nr  nodes  along  direction  t).  A  natural 
way  to  do  this  is  to  split  index  it  into  the  ]  indices  tr„ 

so  that  ify  =  if  mod  n<,  i/  =  [  jj-J .  The  DG  is  now  composed 
of  a  number  of  different  regions  which  can  be  mapped  with  some 
degree  of  independence.  One  needs  to  take  care  of  the  data  move¬ 
ment  between  the  newly  formed  regions,  since  it  can  result  in 
non-local  interconnections.  If  global  links  are  not  permitted,  this 
poses  a  constraint  on  the  projection  vector.  Besides,  the  schedul¬ 
ing  function  must  allow  a  correct  timing  of  the  data  transition 
between  regions.  The  modification  of  the  DG  has  to  be  done  in  a 
preliminary  step,  so  that  the  mapping  procedure  can  be  applied 
in  the  normal  fashion  starting  from  this  higher  dimensional  graph. 
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Abstract* 

Transform  codes  are  used  to  study  low-rate  quantization  of  sta¬ 
tionary  Gaussian  sources.  The  tr2uisform  decorrelates  the  source 
samples  and  then  scalar  quantization  is  applied  to  the  vector  of 
tramsform  coefficients.  Two  bit  allocations  are  considered:  the 
first  permits  only  zero  or  one  bit  to  be  allocated  to  each  trans¬ 
form  coefficient  (i.e.,  the  scalar  quantizers  have  only  one  or  two 
levels),  and  the  second  is  an  optimal  bit  allocation.  For  the  trans¬ 
form  codes  with  the  “0-1”  bit  zdlocation,  a  closed-form,  paramet¬ 
ric  expression  is  derived  for  the  asymptotic  (with  dimension)  rate 
vs.  distortion  performance.  This  expression  is  compared  to  the 
rate-distortion  function,  as  well  as  to  the  performance  of  trans¬ 
form  codes  with  optimal  bit  allocations.  The  principal  result  is 
that  there  is  a  critical  rate,  determined  by  the  power  spectral 
density,  below  which  (and  only  below  which)  0-1  allocations  are 
optimal.  This  is  a  unique  result  in  that  it  determines  optimal  the- 
oreticeil  performance  for  an  important  class  of  vector  quantizers 
at  low  rates.  Quantitative  results  are  presented  for  Gauss- Markov 
sources. 

Summary 

Whereas  a  well  understood  body  of  theory  exists  for  the  analysis 
of  high-rate  quantization  systems  (see  for  example  [1]  or  [2,  ch. 
5]),  a  general  theory  for  analyzing  and  designing  low-rate  quant¬ 
ization  systems  is  not  yet  available.  In  this  paper  we  analyze 
two  classes  of  low-rate  transform  codes.  We  examine  rates  less 
than  (and  frequently  much  less  than)  1  bit/sample  and  allow  the 
quantizer  dimension  (or  block  length)  to  become  very  large  so  that 
asymptotic  methods  may  be  applied.  We  consider  only  discrete¬ 
time,  stationary  Gaussian  sources  and  mean-squved  error  as  a 
fidelity  criterion. 

IVansform-based  source  coding  systems  are  examples  where 
there  is  a  need  to  design  simple,  low-rate  quantizers  for  “lower 
energy”  transform  coefficients.  We  are  able  to  show  that  for  low 
rates,  the  Karhunen-Loeve  transform  is  the  optimal  transform 
among  the  class  of  orthogonal  transforms.  Hence  the  coefficients 
are  also  Gaussian  with  vsiriances  that  are  the  eigenvalues  of  the 
source  covariance  matrix.  As  blocklength  becomes  large  we  can 
determine  the  asymptotic  distribution  of  the  coefficient  variances 
via  Szego’s  Theorem  for  Toeplitz  forms  (3,  ch.  5]. 

We  first  propose  a  product  code  that  scalar  quantizes  each  of 
the  transformed  components  at  rate  either  0  or  1  bit/sample;  we 
refer  to  the  resulting  transform  code  as  a  0-1  transform  code.  The 
0-1  transform  codes  may  be  designed  from  rates  0  to  1  bit/sample. 
For  asymptotically  large  block  lengths  we  find  the  following  para¬ 
metric  expressions  for  the  rate  and  distortion  of  0-1  transform 
codes, 

/?<.  =  -!-/  dw  (1) 

D{Iia)  =  ^  ^  c(l).^(w)dw  -^  ^  /  d(w)dw 

(2) 

where  ^()  is  the  power  spectral  density  of  the  source,  inC  ^(w)  S 
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a  <  sup,^^(iv),  and  c(r)  is  the  mean  squared  error  of  a  2”-level 
Lloyd-Max  quantizer  for  a  unit-variance  Gsuissian  source.  Thus  by 
varying  the  free  parameter  a,  rates  from  0  to  1  bit/sample  can  be 
achieved.  Comparisons  are  made  between  the  above  expressions 
and  the  source  rate-distortion  function.  In  particular,  we  find  that 
the  distortion  penalty  above  the  general  Gaussian  distortion-ride 
function  is  the  same  as  that  of  Lloyd- Max  quimtizers  above  the 
iid  distortion-rate  function. 

The  principal  result  of  our  work  is  as  follows.  For  any  Gaussian 
random  process,  there  exists  a  critical  rate  ro  below  which  0-1 
codes  are  the  optimal  transform  codes.  In  particular, 

ro  =  f  du 

where  =  ess  sup„  ^(u>),  and 

a  cii)-c{2) 

^  l-c(l) 

Knowledge  of  the  process  power  spectral  density  is,  therefore,  suf¬ 
ficient  to  determine  this  critical  rate.  Our  specific  result  is  a  cod¬ 
ing  theorem  which  states  that  below  tq,  0-1  codes  are  optimal,  and 
above  Tq  there  exist  other  transform  codes  strictly  better  than  0-1 
codes.  Since  the  asymptotic  distortion  versus  rate  performance  of 
0-1  codes  has  been  derived  in  (1)  and  (2),  the  optimal  theoreti¬ 
cal  performance  (OPTA)  for  any  transform  code  on  a  particular 
source  has  now  been  found  for  all  rates  less  than  tq.  This  rep¬ 
resents,  to  our  knowledge,  the  first  complete  characterization  of 
the  theoretically  achievable  performance  of  an  important  class  of 
quantizers  at  low  rates. 

We  demonstrate  our  results  for  Gauss-Markov  sources,  and 
make  performance  compiurisons  to  other  transform  codes.  The 
implications  of  this  theory  to  the  design  of  practical,  low-rate 
codes  aie  discussed. 
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ABSTRACT  The  problem  of  designing  a  TSVQ  from  random 
data  has  received  considerable  recent  attention.  A  key  step  in 
many  methods  of  design  is  the  application  of  a  greedy  growing 
algorithm  to  the  empirical  distribution  of  the  data.  In  applications 
of  interest  to  us  these  empirical  distributions  are  of  vectorial  pixel 
intensities.  Here  we  analyze  the  behavior  of  the  greedy  growing 
algorithm  when  it  is  applied  to  the  true  underlying  distribution 
of  the  observations,  and  we  show  that  quantizers  produced  from 
large  data  sets  will  be  close  to  quantizers  produced  from  the  true 
distribution. 

I.  INTRODUCTION 

The  problem  of  designing  a  tree-structured  vector  quantizer 
(TSVQ)  Rom  a  sequence  of  random  observations  has  received  con¬ 
siderable  attention  ([2],[3]).  Recall  that  a  TSVQ  is  completely 
specified  by  a  binary  tree  T  whose  nodes  are  labeled  by  points  in 
IR*';  denote  the  corresponding  quantizer  by  Qt-  Let  Xi,X2,.-- 
be  a  stationary  ergodic  sequence  of  random  vectors  Xt  €  QL*'  hav¬ 
ing  distribution  P.  The  rate  of  a  tree  T  is  the  expected  depth  of 
T  or,  equivalently,  the  expected  number  of  comparisons  required 
to  encode  a  random  vector  Xi.  The  design  problem  is  as  follows: 
Given  Xi,. .  .,Xn  and  a  rate  R>0,  find  a  tree  Tn 
whose  rate  is  not  more  than  R  and  whose  distortion 
^II■X■-Q^,(A■)||*  is  small. 

Here  X  is  a  random  variable  distributed  as  P,  but  which  is  in¬ 
dependent  of  the  process  {X,},  and  ||  •  ||  denotes  the  ordinary 
Euclidean  norm  on  K*. 

One  approach  to  the  design  problem,  based  on  [ij,  is  to  apply 
a  greedy  growing  algorithm  to  the  empirical  distribution  P„  = 
of  the  data  Xi,...,Xn.  The  algorithm  “grows”  a 
TSV'Q  in  a  step-wise  optimal  fashion,  terminating  when  it  obtains 
a  tree  whose  (empirical)  rate  is  greater  than  R.  More  precisely, 
given  a  distribution  H  on  IR*,  and  a  rate  A  >  0,  the  algorithm 
produces  a  nested  sequence  of  trees,  each  of  which  corresponds  to 
a  successive  hyperplane-based  partitioning  of  IR*.  At  each  stage, 
the  algorithm  splits  any  terminal  node  v  of  the  current  tree  whose 
corresponding  cell  V  maximizes  the  ratio  AD*(V')/AA(V)  over 
all  such  cells.  Here  AZ)*(V')  is  the  greatest  reduction  in  distortion 
(with  respect  to  If)  achievable  by  a  hyperplane  split  of  V,  and 
A  A(V)  =  H{V)  is  the  increase  in  rate  associated  with  splitting  the 
node  V.  The  children  of  v  are  labeled  by  the  centroids  of  the  regions 
created  when  V  is  split.  The  algorithm  terminates  when  every 
split  of  the  sort  described  would  make  the  overall  rate  of  the  next 
tree  greater  than  R.  Use  of  the  splitting  criterion  AD*(K)/AA(V) 
amounts  to  a  steepest  descent  in  the  rate-distortion  plane. 

II.  RESULTS 

Throughout  we  assume  that  the  distribution  P  of  the  random 
vectors  X,  has  bounded  support,  and  that  P  has  a  density  with 
respect  to  Lebesgue  measure.  Our  first  three  results  pertain  to  to 
the  “true”  distribution  P,  while  the  last  one  pertains  to  random 
data. 


Lemma  1  If  R  <  oo  then  the  greedy  growing  algorithm  will  ter¬ 
minate  in  a  finite  number  of  steps,  producing  a  finite  tree. 

Although  the  assumption  that  P  has  bounded  support  is  re¬ 
strictive,  it  is  necessary  to  ensure  termination  of  the  greedy  grow¬ 
ing  algorithm. 

Proposition  1  Let  H  =  eip(l)  be  the  ordinary  one-sided  expo¬ 
nential.  Then  if  R  is  larger  than  some  fixed  constant  Ro,  the 
greedy  growing  algorithm  will  not  terminate. 

Note  that  the  output  of  the  greedy  algorithm  need  not  be 
unique.  At  some  stage  of  the  algorithm  there  may  be  a  multiplic¬ 
ity  of  optimal  splits,  involving  one  or  more  nodes;  the  algorithm 
selects  one  of  these,  and  subsequent  splits  are  made  accordingly. 
In  this  context  we  can  strengthen  Proposition  1. 

Theorem  1  For  every  R  >  Q  there  is  a  constant  K  <  oo,  de¬ 
pending  on  R,  such  that  every  tree  produced  by  greedy  growing  has 
maximum  depth  less  than  K. 

Definition;  Let  A(H,  R)  be  the  collection  of  trees  produced  by 
the  greedy  algorithm  acting  on  a  fixed  distribution  H  and  any  rate 
A'  <  A. 

The  following  consistency  result  shows  that  quantizers  pro¬ 
duced  by  the  algorithm  when  it  is  applied  to  the  empirical  dis¬ 
tribution  Pn  will  eventually  be  close  to  quantizers  produced  by 
the  algorithm  when  it  is  applied  to  the  true  distribution  P  of  the 
observations. 

Theorem  2  Fix  e  >  0  and  for  n  =  1,2,...  let  Tn  6  A(Pn,R). 
With  probability  one  there  exists  a  sequence  Tn  €  .4(P,  A)  such 
that 

P{\QTn  -  QtJ  >  e}  -  0 

as  n  tends  to  infinity. 
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Abstract 

Clustering  algorithms  can  be  applied  to  the  design  of  N  codebooks 
to  be  shared  by  M  sources,  1  <  M  <  N.  We  previously  introduced 
a  constrained  storage  vector  quantization  algorithm  for  this  design 
problem.  In  this  work,  we  extend  the  algorithm  to  additionally  design 
simple  parametric  expandor  functions  to  enhance  codebook  sharing 
efficiency.  We  apply  the  particular  case  of  scaling  expandor  func¬ 
tions  to  the  compression  of  tree  structured  vector  quantization  code¬ 
books.  By  allowing  different  levels  of  the  tree  codebook  to  share  a 
library  of  feature  (residual)  codebooks,  we  were  able  to  achieve  in 
our  experiment  a  4  ;  1  reduction  of  storage  without  compromising 
rate-distortion  performance.  For  very  deep  trees,  an  earlier  design 
method  which  effects  sharing  only  within  each  level  of  the  tree  is 
more  effective. 

Summary 

Clustering  is  a  popular  means  to  codebook  “training,”  i.e.  to  reduce 
a  “training  set”  characterization  of  the  probability  distribution  of  a 
source  to  a  smaller  set  of  representative  vectors.  The  most  common 
clustering  algorithm  for  codebook  design  is  perhaps  the  generalized 
Lloyd  algorithm  (GLA)  [1],  also  known  as  the  K-means  algorithm. 
Recently,  we  introduced  what  could  be  considered  as  a  generaliza¬ 
tion  of  the  GLA,  called  the  constrained  storage  vector  quantization 
(CSVQ)  algorithm  [2],  for  clustering  a  set  of  A  sources  to  shone  M 
codebooks,  1  <  M  <  iV.  The  CSVQ  algorithm  was  introduced  to 
control  the  storage  complexity  for  one  feature  at  a  time  in  generalized 
product  codes  [3].  The  algorithm  has  been  applied  to  improve  the 
performance  of  multistage  VQ  (MSVQ)  [2]  and  restrict  the  storage 
complexity  of  tree  structured  VQ  (TSVQ)  [4].  Chou  [5]  listed  various 
other  applications  of  GLA  clustering. 

In  [6],  Lindsay  et  al.  described  various  ways  of  restricting  the  ori- 
entatirn  of  the  partitioning  hyperplanes  of  a  binary  TSVQ  codebook 
to  achieve  storage  reduction.  Alternately,  the  method  we  described  in 

[4]  requires  the  TSVQ  codebook  to  be  stored  in  an  equivalent  trellis 
of  residual  feature  codebooks  [3].  In  other  words,  TSVQ  is  equivalent 
to  MSVQ  except  that  TSVQ  has  path-dependent  residual  codebooks. 
The  trellis  is  a  consequence  of  codebook  sharing  by  the  conditional 
residual  sources  in  one  stage;  without  sharing,  the  trellis  becomes  a 
tree.  The  sharing  is  instrumented  by  applying  the  CSVQ  algorithm 
to  grow  the  tree  one  level  at  a  time.  Linear  growth,  as  opposed  to  ex¬ 
ponential  growth,  in  storage  complexity  with  rate  has  been  achieved 
while  incurring  virtually  no  rate-distortion  penalty  [4].  While  this 
method  was  used  to  design  binary  TSVQ  codebooks  up  to  many  lev¬ 
els  (say  26),  for  codebooks  of  moderate  depth,  it  is  possible  to  achieve 
sharing  of  the  feature  codebooks  over  the  entire  tree  rather  than  just 
one  level  at  a  time. 

Suppose  there  are  W  sources  sharing  M  codebooks,  1  <  M  <  N. 
Associated  with  source  i€{l,...,V},isa  codebook  pointer  m ,  and  a 
parametric  expandor  function  h;,  with  parameters  Pi,i, .  ■■,Pi,j-  The 
code  vectors  c  for  encoding  the  t-th  source  are  obtained  from  the 
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codebook  whose  address  is  pven  by  the  value  of  /ij.  Each  code  vec¬ 
tor  V  from  the  codebook  is  transformed  by  the  expandor  function, 
viz.  c  =  bf(v,pi,i,...,p,',y).  Examples  of  simple  expandor  trans¬ 
formations  are  scaling,  translation,  and  rotation,  whose  parameters 
are  respectively  a  scalar  gain,  a  vector  offset,  and  a  unitary  matrix. 
In  the  cases  of  scaling  and  translation,  it  is  fairly  straightforward  to 
modify  the  CSVQ  algorithm  to  jointly  design  the  shared  codebooks, 
the  pointers,  and  the  expandor  parameters  [7].  In  particular,  the  rel¬ 
atively  simple  gain-companding  CSVQ  enablsb  ;hziiiig  of  residual 
codebooks  across  all  or  a  subset  of  levels  of  a  tree.  The  shared  code¬ 
books  are  stored  in  a  library  and  the  tree  nodes  ate  populated  with 
codebook  pointers  and  gain  parameters.  For  an  11-level  TSVQ  of  a 
source  of  high-fidelity  audio  transform  coefficient  vectors  [4],  we  have 
obtained  a  compression  ratio  of  4  :  1  with  virtually  no  rate-distortion 
penalty.  This  ratio  is  relatively  modest  in  comparision  with  the  orders 
of  magnitude  of  storage  reduction  obtained  for  very  deep  trees  grown 
level  by  level  with  CSVQ  [4].  Nevertheless,  higher  compression  ratio 
could  be  attained  if  rotation  is  also  incorporated  into  the  compand¬ 
ing,  as  suggested  by  the  asymptotic  results  of  Lee  et  al.  [8].  However, 
the  required  rotation  matrices  increase  both  the  storage  and  process¬ 
ing  requirement  so  that  the  overall  complexity  performance  tradeoff 
is  not  necessarily  improved.  This  problem  wUl  be  further  explored. 
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SELF  SYNCHRONISING  T-CODES  TO  REPLACE  HUFFMAN  CODES 
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Abslracl  -  This  paper  describes  recent  work  on  the  T-Codes,  which 
are  a  new  class  of  variable  length  codes  with  superlative  self- 
synchronizing  properties.  The  T-Code  construction  algorithm  is 
outlined,  and  it  is  shown  that  in  situations  where  codeword 
synchronization  is  important  the  T-Codes  can  be  used  instead  of 
Huffman  codes,  giving  excellent  self-synchronizing  properties  without 
sacrificing  coding  efficiency. 

Background 

When  corruption  occurs  in  a  stream  of  data  which  is  coded  with 
variable  length  codes,  the  decoder  can  lose  track  of  where  codeword 
boundaries  are  located  in  the  data  stream,  and  so  the  effect  of  the 
corruption  can  extend  over  a  large  number  of  received  symbols. 
Variable  length  code  sets  can  be  chosen  such  that  the  receiver  is  able 
to  determine  the  correct  location  of  these  codeword  boundaries 
relatively  quickly  after  a  corruption,  but  this  is  usually  done  by 
choosing  codes  in  which  certain  bit  sequences  iKCur  only  at  the  end  of 
codewords.  The  receiver  must  look  for  these  special  sequences  in  the 
received  data  stream.  In  some  cases  it  is  possible  to  choose  codes 
which  will  self-synchronize  as  a  result  of  the  normal  decoding 
process,  but  these  are  relatively  difficult  to  find. 

T-Codes 

The  original  discovery  of  the  T-Codes  was  published  in  1984  by 
Titchener  [1].  This  work  gave  an  algorithm  for  generating  families  of 
T-Code  code  sets,  and  showed  that  all  code  sets  generated  in 
accordance  with  this  algorithm  arc  self-synchronizing  by  nature. 
These  properties  are  not  derived  from  having  specific  synchronizing 
bit  sequences,  but  rather  the  synchronizing  information  is  spread 
throughout  the  code  as  an  inherent  part  of  its  construction.  The 
construction  of  the  codes  is  such  that  when  codeword  synchronization 
is  lost  as  a  result  of  a  corruption,  the  receiver  will  automatically  re¬ 
synchronize  on  a  subsequent  codeword  boundary  without  any  special 
measures  being  taken.  This  will  happen  even  when  the  loss  or 
corruption  is  not  recognized  as  such.  When  a  data  loss  or  corruption 
is  known  to  have  occurred,  an  algorithm  is  available  for  the  receiver 
to  determine  the  point  at  which  subsequent  codeword  synchronization 
is  re-establi.shed. 

T-Codc  Construction  Algorithm 

The  T-Code  consnuction  algorithm  is  very  simple.  Code  sets  are 
constructed  by  augmenting  lower  level  T-Code  code  sets,  with  the 
lowest  level  being  the  code  set  0  and  1 .  The  augmentation  process 
consists  of  writing  out  a  list  with  two  copies  of  the  lower  level  code 


Level  0  I 

Level  1 

Level  2 

set,  and  then  sacrificing  a 

Prefix  0 

Prefix  01 

codeword  from  the  first 

0 

-> 

- 

half  of  the  list  and  using  it 

1 

1 

1 

as  a  prefix  for  every 

fio 

00 

codeword  in  the  .second 

ill 

--> 

half  of  the  list.  This 

- 

produces  a  new  code  .set 

oil 

which  has  nearly  twice  the 

aioo 

number  of  codewords  of 

QiOl 

the  lower  level  code  set. 

Table  1 

An  example  of  this  process 

T-Code  Construction  Algorithm  is  given  in  Table  1. 


T-Code  Synchronization  Properties 

Titchener  showed  that  every  T-Code  will  have  .self-.synchronizing 
properties,  with  typical  synchronizing  delays  of  a  lew  codewords,  but 
it  was  not  until  the  work  by  Higgle  (2|  showed  that  these  self 
synchronizing  properties  are  available  without  lo.ss  of  coding 
efficiency  that  the  .significance  of  the  codes  became  evident.  Any 


codeword  from  a  code  set  can  be  used  as  the  prefix  to  prtnluce  the 
augmented  code  set  at  the  next  level.  The  length  of  the  prefix  chosen 
affects  the  codeword  length  distribution  of  the  next  level  ccxle  set,  so 
by  careful  choice  of  prefix  lengths  it  is  possible  to  produce  T-Cixles 
which  match  the  codeword  length  distribution  required  for  efficiently 
coding  any  particular  information  source  (i.e.  effectively  the  same 
codeword  length  distribution  as  a  Huffman  code  designed  for  the 
source). 

Biggie's  work  |2]  also  showed  that  the  T-Codes  which  give  maximum 
efficiency  for  any  particular  information  source  generally  include  at 
lea.st  one  which  has  an  average  synchronization  delay  of  around  1.5 
codewords. 

Current  Researct.  Activitv 

The  work  reported  in  the  paper  by  Higgle  [2]  used  Monte  simulation 
techniques  to  show  that  it  is  possible  to  choose  an  efficient  and 
rapidly  synchronizing  T-Code  for  any  particular  application.  These 
simulations  also  showed  that  not  all  T-Codes  are  equal  in  their 
synchronization  perfoimance  and  that  the  task  of  choosing  the  best  T- 
Code  for  a  particular  application  is  not  a  uivial  one.  Attempts  at 
justifying  why  si>me  T-Codes  are  better  than  others  have  recently  led 
to  a  new  technique  for  theoretically  determining  the  average 
synchronization  delay  of  T-Codes  when  they  are  used  efficiently. 
This  technique  offers  several  advantages  over  the  previously  used 
Monte  Carlo  techniques,  and  provides  insight  into  how  the  T-Codes 
achieve  their  enviable  synchronizing  properties. 

The  theoretical  technique  is  now  being  used  in  calculating  u  database 
of  the  T-Codes  which  have  the  best  synchronizing  performance.  It  is 
hoped  that  this  database  will  be  useful  in  enabling  a  user  with  a 
particular  infomiation  .source  to  choo.se  a  T-Code  which  will  be  as 
efficient  as  a  Huffman  code  designed  for  the  .source,  but  with  an 
average  .synchronization  delay  of  about  1.5  codewords. 

Current  research  is  also  focusing  on  the  use  of  T-Codes  in  FAX 
machines  and  in  the  JPEG  image  compre.ssion  standard,  particularly 
with  respect  to  U'an.smiiting  images  in  the.se  formats  over  mobile  radio 
channels.  This  is  only  one  of  many  potential  application  areas,  as  T- 
Codes  can  be  used  to  advantage  in  any  situation  where  the  probability 
of  data  corruption  is  high  enough  to  make  the  u.se  of  non- 
synchronizing  or  poorly  synchronizing  Huffman  codes  difficult. 

Conclusion 

The  T-Code  generation  algorithm  has  been  demonsU-aled  to  provide 
variable  length  code  sets  which  have  both  the  de.sirablc  properties  of 
coding  efficiency  and  rapid  self-synchronization.  For  any  particular 
information  .source,  properly  chosen  code  sets  can  typically  offer 
average  synchronization  delays  of  1.5  codewords  without  .sacrificing 
ctxling  efficiency  compared  to  that  obtained  with  a  Huffman  code 
designed  for  the  .source.  This  means  that  it  is  now  possible  to  u.se 
variable  length  codes  in  applications  where  the  probability  of 
corruption  is  high  and  the  problems  of  codeword  synchronization 
have  previous  excluded  their  u.se. 
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KiefFer’s  Sample  Converses  for  Source 
Coding 

En-hui  Yaag 

Dcputment  of  Mathomotici,  Nanini  (Jnivtriity 
Tiai\jiii  tOOOTl,  P  R  China 

Abftraet — Naw  proofa  of  roeont  Kltffar’i  lampU  eonvariaf 
for  loureo  coding  aro  giran  uilng  a  aampla  path  eovarlng  tdaa 
originatad  bjr  Ornataln  and  Walft  and  modlllad  by  Shlaldi  to* 
gathar  with  BlrkholTi  argodte  thaoram. 

Throughout  the  paper,  we  Hx  a  meaeurable  epace  (A,  A)  at  our  eource 
alphabet  and  a  meaeurable  apace  aa  our  reproducing  alphabet. 

For  our  purpoeee ,  a  eource  p  ii  a  etationary,  ergodic  proeeae  (X. )  talcing 
raluea  in  the  alphabet  A.  It t  =  (n)  ia  a  finite  or  infinite  aequenee  from 
A  or  A,  let  xD,  =  (x,ii,i,r+i,'--,Xs)  and,  for  aimplicity,  ccrite  x^  aa 
X*.  Let  {pa|n  =  be  a  fixed  fidelity  criterion  in  which  each  p. 

ia  a  meaeurable  function  from  A'  x  A‘  |0,-l-oo).  Recently,  Kieffer|l| 
proved  the  following  two  aample  convenea  for  eource  coding. 

Thoorom  1  daaame  tfrol  iAe  fiAelitf  ttiltrion  fp.|n  =  l,2,---)  U 
aataddi'iive,  i.a.,  (n  +  m)p.+„ ((xi,x,),(y,,v,))  <  np,(x,,y,)  + 
end  that  thert  aairta  a  y’  e  A  for  laAicA  £yi(Xi,y’)  <' 
+00.  Lat  R  >  0.  Than  for  any  jayaence  {fl.}  in  laAicA  B»  ia  an  nlA- 
ordar  Hoek  coda  with  rata  no  yreater  than  R,  at  have 
lim  M >  D(R)  ,  a.a. 

mhara  f(Bt)  =  min,es.  Pm{X*,y)  and  D(R)  ia  the  diatortion  rale  fane- 
lion  of  the  aaarte  y  relative  to  the  fidelity  criterion  (pa|n  =  1,2,---). 

Thoorom  3  ylaanma  tAot  {p«|n  =  1,2,-")  aalie- 

/5«*d.+  ii.((*li*>).(Vl.Vj))  <  max{p.(xi,pi),p„(xi,v,))  and  that  for 
aaeh  D  >  0  there  aaial  a  eoanlakle  auiaat  =  (iri)  C  A  and  a  count- 
aHa  maoaaraHa  partition  (Fi)  of  A  each  that  pi(i,yi)  <  D,  x  e  Ft,  for 
aaeh  yi  €  So,  end  -  2,  Pr{A i  €  Fi)  log Pr(A'i  6  Fc )  <  +oo.  Han  for 
any  aayaanea  ((■}  in  laAicA  C,  u  *n  nih-ordar  variatia  rata  code  that 
oparalaa  at  the  diatortion  level  D,  aa  have 

liminfr(C.)  > /i(0)  ,  a.a. 

wAara  r(Ci,)  ia  the  aample  rate  function  of  Cn  and  11(D)  it  the  (opera¬ 
tional)  rata  diatortion  function  of  p  relative  to  {p.|ri  =  l,2,---). 


any  poaitive  real  number  a,  Pr{p(flo)  >  *}  >  0;  (2)  there  exiata  a 
real  a  auch  that  Pr(p(Ro)  ^  «)  =  0.  Since  it  ia  eaaier  to  prove  Caae 
(2)  than  Caae  (1),  we  confine  ouraelf  to  Caae  (1).  Let  a  be  a  poaitive 
real  number  to  be  apecified  later.  Let  A  =  Pr{p(Bo)  ^  o}-  Obvioualy, 
a<  -»  0  aa  o  -•  +0O.  Fix  a  poaitive  integer  m(2/nt  <  A)  auch  that  if 
D,  =  (x  6  A“|p(fi«)(x“)  <  D(R)  +  A),  then  Pr{f>i}  >  I  -  A.  Let 
D  =  Di  n  {x  e  A’"\p(Bo]  <  a}(uting  D  inatead  of  D\  ia  a  tridc).  In 
view  of  (1),  fix  Af  >  m/A  and  chooae  N  >  hi  to  large  that  if  C  = 
uJLxel*  E  <  D{R)  -  e),  then  Pr{C}  >  For  auBciently 

large  n,  define 


«.  =  { 


X  e  A“|(n  -  /V  +  l)-‘  do(rx)  >  I  -  2A 

•-/f  \ 

k(n-N-t-\)-'Y^Jc(T‘x)>n\  , 

imO  ) 


where  T  denotea  the  ahift  on  A’*’  defined  by  (7’x)f  =  xt-vi,  and  dx>(x) 
and  dc(x)  are  the  indicator  Ainctiona  of  D  and  C,  reapectively. 

We  next  aaaociate  with  each  x  €  (7.  a  partition  {/<}'/*^  of  Ill'll  into 
conaecutive  aub-intervala.  Aaaume  have  been  defined  and 

=  |l,u  -  I).  The  /i(i  >  1)  ia  defined  according  to  the  following 
procedure: 

SI  If  r*~'x  i  Due  or  u  >  n  -  Af  +  I,  then  put  A  =  |u,u). 

S3  Otberwiae,  teat  the  memberahip  of  T*~'x  in  C.  If  7"’~'x  £  C, 
put  A  =  |u,v],  where  v  it  the  leaat  poaitive  integer  auch  that 
p(B,-.+,)(x;)<D{/i)-€. 

St  Otberwiae,  teat  whether  there  exiata  I  <  /  <  m  auch  that 
T*'*’-'"*!  £  C.  If  exiata,  put  It  =  |u,ii);  if  not ,  put  A  =  [u.ii+iii-I]. 
The  total  number  of  all  theae  partitiona  can  be  upper  bounded  by 
Par  each  partition  {A}i  ^  conatruct  an  nth-order  block  code 
=  fl(/i)  X  ...X  B(A,.,),  where  B(A)  =  Bo  lAI  =  I.  » 
lAI  =  in,  and  B||,|  otberwiae.  Combining  ail  tbeae  block  codea  together 
with  Bg,  are  get  a  new  nth-order  block  code  Bl  with  rate  leea  than 
R  -i-  H{i)  -I-  l/n.  From  the  above  conatruction,  we  can  deduce  that  for 
any  X  £G,, 


p{B:)(x')  <  Aa  -K  D(R)  -b  a  -  7(t  +  *)/» 

m-K  a-l 

-Pn-'^d-  Jo(rx))p(BoHT‘t)  +  n-‘  p(Bo)(r»)  . 

1=0  Imm-N-Vt 

SiiKe  D(R)  ia  convex  and  nonincreaamg,  finally  we  can  obtain 


Theorema  1  and  2  are  called  the  aample  convene  for  aource  coding  at 
a  fixed  rate  level  and  the  aample  convene  for  aource  coding  at  a  fixed 
diatortion  level,  reapectively.  They  are  the  firat  general  aample  converaea 
in  aource  coding  theory.  For  the  caae  of  fixed  diatortion  level,  BaiTon|2| 
and  Shitlda|t|  proved  Theorem  2  for  the  apecial  caae  in  which  A  =  Ait 
finite  and  p.  =  0,  and  Omatein  and  ShieldajA)  ahowed  Theorem  3  for  tire 
apecial  caae  in  which  A  =  Ait  finite  and  (p.)  ia  the  Hamming  fidelity 
criterion. 

Both  proofa  of  Theorema  I  and  2  given  in  |l  |  involve  to  a  great  extent 
a  powerfiil  new  ergodic  theorrm|i|.  We  prevent  here  new  proofa  of  The¬ 
orema  I  and  2  that  uae  only  aome  aimple  code  conatruction  techniquea 
together  with  Birkholf’a  ergodic  theorem  and  the  aample  path  covering 
idea  originated  by  Omatein  and  Weiat|6|  and  modified  by  Shielda|7|.  In 
fact,  the  trick  liea  in  how  to  uae  aubtly  the  aample  path  covering  idea. 
Since  both  new  proofa  of  Theorema  1  and  2  are  almoat  the  aame,  in  what 
followa,  we  only  give  the  aketch  of  proof  of  Theorem  1. 

The  Sktieh  of  Proof  of  Thaoram  I:  Firat  note  that  uaing  the  code 
conatruction  technique  outlined  in  |8|,  we  can  deduce  from  the  tradition 
block  aource  coding  theorem  the  following  rtault:  there  exiata  a  aequenee 
(£«},  where  6m  an  nth-order  block  code  with  rate  no  greater  than 
A,  auch  that 

limaupp(/)a)  <  D(R)  ,  a.a.  . 

■  -^OO 

Suppoae  Theorem  I  ia  not  true.  Then  there  exiat  a  c  >  0,  a  7  >  0,  and 
a  aequenee  (Ba),  where  Ba  ia  an  nth-order  block  code  having  rate  A  or 
leaa,  auch  that 

Pr{nJ».,o“*{x£A~|p(Ba)(x")<D(A)-c|)  >7-  (D 
Let  Bo  =  {v*}  be  a  one  order  block  code  with  p(Bo)  =  pi(.V|.v*). 
In  order  to  lead  to  a  contradiction,  we  diatinguiah  two  eaaei:  (I)  for 


D{R)  <  t(H(i)  -(-  I/n)  -t  Aa  -h  D(R)  A  -  7(c  +  A)/2  -b  (Af£;p(flo))/n 
(I  -  dj,(x))p(Bo)(x)dM  -f  /  n-'  £  p(Bo)(T‘t)dy 

where  -a(a  >  0)  ia  the  right  derivative  of  O(-)  at  A.  Letting  n  -»  00 
and  then  letting  a  — *  00  lead  to  a  contradiction. 
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Some  Preliminary  Descriptions 

Recently,  some  video/audio  compression  algorithms  of  variable- 
length  transform  coding  are  suggested.  However,  a  common 
problem  associated  with  the  use  of  variable-length  source  codes  is 
that  channel  errors  may  cause  the  loss  of  synchronization,  which 
leads  to  extended  emns  in  the  decoded  text  In  this  paper,  a  trial- 
and-error  algorithm  with  cross-length-checks  has  been  proposed  for 
the  partial  correction  of  these  error  propagations  without  recourse  to 
error-correcting  codes. 

A  block  of  N  transfmm  coefficients  are  divided  into  m  groups  of 
length  n.  Over  diese  codeword  lengths  clg  (i=l....,m;  j=l....,n), 
two  sets  of  parity  checks  are  derin^  as  the  volume  check  sums 
VCSj(j  =  l.-,n)by 

VeSj  =  £  clkj ,  (mod  L) 
k  -1 


(  i=l,— ,n  )  and  the  cross  test  variables  Q  ( i=l,— ,n  )  ,  are 
calculated  by 

1,  ifte  VCStcdculatcdaitheieceiva’daesintinaididBtneived; 
(X  else; 
and 

1,  if  te  CeSi  at  the  receiver  does  not  match  that  neived: 

(X  else; 

with 

VBm  =  min(i:  Vi=l  )and  C;Bin=min(j:Ci=l  ) 

It  is  affirmative  duit  all  the  codewords  which  are  the  Vndn-th  or 
(Cmm-i-*-I)-th  codeword  and  at  the  same  tinae  in  die  groups  with 
Li=l  are  regarded  as  the  suspected  causal  codewmds.  They  can  be 
divided  into  two  sets  and  their  index  sets  tqnesented  by 

iSj  “  ( ( 1 ,  Vmin )  ^  Lj  =1  and  i  ^  or } , 
and 

52  =  { ( j .  Qim  -  j+1) :  Li  =1  and  i  <  a ) , 


and  as  the  cross  check  sums  CCSi  ( i  =  1, ... ,  m-fn-1 )  by 


mint  i.m  ) 

CCSi  ~  X  cik,i-k+i  •  (mod  L) 
k  «inu(l,i-n+l) 

where  L  is  an  integer  larger  than  the  maximum  codewend  length  of 
the  variable-length  code  used. 

Error  Propagation  Detection  Procedure 

The  length  in  bits  of  coefficient  groups  are  transmitted  to  the 
receiver  to  detect  error  propagations.  After  decoding  n  coefficients, 
the  receiver  is  reset  to  the  initial  state  so  as  to  resynchronize  the 
decoding  procedure  when  error  propagations  occur  in  the  past 
coefficient  group.  At  the  end  of  decoding  each  coefficient  group,  the 
test  variable  Li  ( i  =  1.  — ,  m )  of  the  group  length  are  evaluated  by 

if  the  total  nimber  of  bits  lead  fix  die  i-di  group  does  not 
match  the  transmiued  group  length; 

else . 

Li=l  means  that  channel  errors  have  occurred  in  the  i-th 
coefficient  group  and  caused  error  propagations. 

Indicating  the  Suspected  Causal  Codewords 

The  reason  for  the  occurrence  of  an  error  propagation  is  that  the 
first  codeword  in  the  error  propagation  is  changed  by  erroneous  bits 
into  an  another  codeword  of  different  length,  and  the  following  text 
can  be  decoded  as  a  number  of  coefficients,  which  are  different 
from  the  originals.  In  the  view  of  this,  if  we  can  find  these  causal 
codewords  and  determine  their  length,  we  can  correct  the  error 
propagations  by  resynchronizing  the  decoding  procedure.  For  this 
reason,  two  sets  of  test  variables,  the  volume  test  variables  V; 


where  a  can  be  obtained  by 


Ct  “  Gnm  -  Vouj,  + 1 . 


Determining  the  length  of  Suspected  Causal  Codewords 

in  order  to  resynchronize  the  decoding  procedure,  the  length  of 
the  suspected  causal  codewords  must  be  evaluated  by  using  the 
received  VCS's,  CCS's.  For  clg  inSj,  the  codeword  length  L»(i)  is 


L,(i)  =  VCSv^-  2  dk.v*  (modL). 

k>l 

ki>i 

and  for  clg  in  ^2,  L^fi)  is 

mint  C-u.  m ) 

L/i)  =  CCSc^-  2;  clk.c^.k^,  (modL). 

k«min(l  .C^-n+I) 
k«i 


Main  Procedure 

The  suspected  causal  codewords  are  searched  with  the  check 
sums  VeS 's  and  CCS 's.  For  one  of  them,  the  length  L»(i)  or  L^O) 
is  evaluated,  the  decoding  of  the  associated  coefficient  group  is 
resynchronized  and  performed  once  again.  The  length  chwk  Lj  is 
us<d  to  determine  the  success  of  the  correction.  This  trial-aixl-error 
procedure  is  iteratively  performed  for  all  suspected  causal 
codewords. 
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A  sampk  recursive  method  of  co-opthnizing  source  and  chaimel  coding 
In  digital  transmission  of  analog  signals  is  presented.  The  procedure  U 
an  iterative  approach  addressing  both  the  adjustment  of  the 
reproduction  vectors  according  to  channel  errors,  as  well  as  the 
problem  of  index  assignment 

1.  Introduction 

Vector  quantization  (VQ)  plays  an  important  role  as  a  source  encoder  in 
digital  transmission  of  analog  signals.  The  difficulty  of  designing  high 
dimensional  VQs  is  a  severe  obstacle  for  practical  usage.  Besides  search 
and  storage  difficulties,  the  two  main  problems  in  the  design  are  how  to 
distribute  the  reconstruction  vectors  over  the  source-space,  and  how  to 
choose  the  code  words,  or  indices,  so  that  the  effect  of  chtumel  errors  on  the 
performance  is  minimized.  Traditionally,  the  two  problems  are  treated 
separately,  but  there  is  a  current  trend  to  regard  them  as  one  and  minimize 
the  distortion  at  the  receiver  using  zero-redundant  VQs,  i.e.,  the  additional 
bits  normally  incorporated  for  error  protection  are  employed  to  refine  the 
quantizer  without  explicit  error  protection.  A  VQ  trained  with  the  LBG 
algorithm  can  be  very  sensitive  to  channel  errors  due  to  its  tendency  for 
random  ordering  of  the  indices.  Index  ordering,  known  as  the  Index 
Assignment  (lA)  problem  is  an  important  part  of  the  VQ  design. 
Unfortunately,  finding  the  optimal  lA  belongs  to  the  class  of  NP-complete 
problems. 

I A  is  discussed  for  scalar  quantizers  in  [I]  More  recent  work  is[2]  or  [3] 
and  [4]  where  the  lA  is  a  post-process  to  the  VQ  design.  In  [5]  the  lA  is 
incorporated  in  the  LBG  algorithm  and  thereby  is  a  co-optimization  of  the 
source-  and  of  the  channeI<oding. 

One  difficulty  with  Channel  Optimized  VQ  (COVQ)  is  that  the  channel 
error  probability  is  a  design  parameter  in  the  optimization.  In  a  real 
transmission  situation,  this  parameter  is  difficult  to  estimate.  It  may  even 
vary  in  time,  making  the  design  according  to  a  specific  value  rather 
academic.  More  important  is  how  robust  the  design  is  to  a  mismatch 
between  the  actual  error  probability,  q,  and  the  design  parameter  e. 

2.  The  method 

The  iterative  approach  suggested  in  this  paper  is  thoroughly  described  in 
[6]  and  is  only  recapitulated  here.  Let  a^fx)  denote  the  total  squared  error 
distortion  associated  with  choosing  the  i  :th  of  the  Af  =  2‘  reconstruction 
vectors,  y,  ,  given  an  observation  x,  i.e., 

«,(*)=  i  |x-yjp  pji,  (I) 

y*0 

The  total  distortion  to  be  minimized  can  be  written 

o=  l'£t«/(X)|X€/r,  (2) 

f  wO 

where  the  optimal  partitioning  of  the  signal  space  gives  the  regions 
Af,  ={x€R‘':a,(x)<a;(x)Vy}  (3) 

where  is  the  probability  of  receiving  the  index  j  given  that  <  was  sent 
and  where  is  the  probability  of  X  e  AT,, 

The  iterative  algorithm  adjusts  all  reproduction  vectors  after  each  obser¬ 
vation  X„ .  The  reproduction  vector  causing  minimum  expected  distortion  at 
the  receiver  is  voted  winner  and  is  denoted  y,. 

/  =  argmin{a,{Ar,)}  (4) 

The  individual  adjustments,  or  step  sizes  y),  depend  on  the  probability  of 
interchanging,  due  to  channel  errors,  the  code  word  j  with  the  winner  /. 

y)=/(')Py|/  (5) 

where  /(>)  is  an  annealing  function  with  f(D  =  0,  T  being  the  predeter¬ 
mined  training  time. 

The  sample  vector  updating  formulas,  in  conjunction  with  steepest  descent, 
become 

3'';'=y^r',  (x,-y'y)  v;  (6) 

3.  The  structure  of  a  robust  quantizer 

A  VQ  optimized  according  to  (3)  with  e  >  0  ,  becomes  more  conservative, 
in  the  meaning  that  the  reproduction  vectors  are  shifted  towards  the  center 
of  mass  of  the  information  source,  ths  a  VQ  trained  without  bit  errors  as 
can  be  seen  e.g.  in  fig  lb.  and  Ic.  In  this  figure,  code  vectors  are  connected 
with  a  line  if  their  code  words  are  Hamming- 1  neighbors.  Fig  la)  and  c) 
show  the  VQs  designed  with  r  =  0,00  and  c  =  0.05  respectively.  The 
structural  ordering  in  c)  compared  to  a)  is  striking  and  is  decisive  for  the 
robustness  if  q*  c  (i.e.  design  mismatch).  Figure  lb)  shows  the  same  VQ 


as  in  a)  but  with  an  ordering  of  the  indices.  Fig  Id)  shows  the  inherent 
robustness  of  a  good  lA  to  design  mismatch 


Figure  I  a-d)  Examples  of  the  robustness  to  error  mismatch  for  three 
VQs.  The  VQs  in  a)  and  b)  have  identical  reconstruction  vectors, 
trained  with  e=0.  but  an  lA  procedure  is  added  to  the  design  in  b). 
Hamming- 1  neighbors  are  connected  with  a  line.  The  VQ  in  c)  is 
trained  with  c=0.05.  d)  shows  the  performance  of  a-c),  for  a  BSC 
when  the  channel  error  varies  from  ^=0  to  q=0.0S. 

4.  Experiments 

The  capability  of  the  algorithm  is  thoroughly  investigated  using  a  number 
of  Gauss-Markov  sources  in  2,3,4  and  8  dimensions.  The  following  table 
presents  a  few  of  the  results,  comparable  to  the  tests  in  [5],  at  the  rate  1 
bit/sample. 

Table  I.  Experimental  results  obtained  using  2  and  4  dimensional 
Gauss-Markov  sources  with  correlation  factors  rf  0.0  and  0.9.  The 
training/evaluation  sets  are  large  (2  ■  10°  II-  Hr)  and  the  channel 
is  binary  symmetric. 


N 

Bit  error 

Corr-O.O 

CorT-0.9 

0.00 

4.40 

7.91 

2 

0.05 

3.15 

4.71 

0.10 

2.27 

3.34 

0.00 

4.66 

10.2 

4 

0.05 

3.15 

6.05 

0.10 

2.28 

4.52 

5.  Conclusions 

The  optimization  method  presented  in  this  paper  has  proved  to  yield 
structured  solutions,  close  to  linear  mappings  of  the  hyper  cube,  which 
offers  both  a  good  VQ  and  an  accurate  lA.  The  design  parameter  can 
during  training,  be  decreased  to  zero  and  thereby  obtaining  a  robust  VQ  due 
to  the  incorporated  lA.  The  results  obtained,  points  out  that  the  algorithm 
works  favourably  compared  with  others. 
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Abstract 

Low  bit  rate,  high  quality  speech  coding  is  a  vital  part  in  voice  telecom¬ 
munication  systems.  The  introduction  of  CELP  (1984,  Codebook  Ex¬ 
cited  Linear  Prediction)  speech  coding  provides  a  feasible  way  to  compress 
speech  data  to  4.8  kbps  with  high  quality.  However,  the  formidable  com¬ 
putational  complexity  required  for  real-time  processing  has  prevented  its 
wide  application.  Using  our  codebook,  we  reduce  the  computational  com¬ 
plexity  of  codebook  search,  which  ori^nally  accounts  for  2/3  of  the  com¬ 
putational  complexity,  to  almost  nothing;  while  preserving  the  same  good 
speech  quality.  This  tremendous  redaction  in  computational  complexity  is 
achieved  by  replacing  the  traditional  stochastic  codebook  by  an  artifi¬ 
cially  constructed  deterministic  codebook.  After  a  careful  study  of  the 
minimization  of  vector  quantization  distortions,  we  found  that  although 
“randomness”  is  usually  observed  in  speech  residuals;  it  is  not  necessary 
to  use  a  noise-like  stochastic  codebook  to  encode  the  speech  residuals.  As 
long  as  the  code  vectors  were  distributed  uniformly  over  a  sphere,  very 
small  VQ  errors  can  be  achieved.  The  most  significant  advantage  of  using 
this  deterministic  codebook  is  extremely  fast  codebook  search.  After  this 
reduction,  we  have  an  algorithm  about  5  MIPS.  It  can  be  handled  by  even 
inexpensive  DSP  chips,  while  maintaining  the  same  high  quality.  Besides 
extremely  simple  encoding  and  decoding  schemes,  this  codebook  also  pro¬ 
vides  optimal  error  tolerance  property  and  it  doesn’t  require  codebook 
storage.  We  hope  our  contribution  can  finally  make  CELP  speech  coding 
a  widely  applicable  technology. 


“Texas  Instruments 

I  Martin  Marietta  Chair  in  Systems  Engineering 


340 


LOSSLESS  COMPRESSION  ALGORITHMS  FOR  HIGH  HDELITY  AUDIO  COMPRESSION 
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1.1  Summary 

Real-time  algorithms  for  the  compression  of 
high-fidelity  audio  are  presented.  The  goal  of  these  al¬ 
gorithms  is  to  provide  a  compact,  high  fidelity,  digital 
representation  for  an  input  stream  of  audio  samples. 
We  are  developing  an  adaptive  transform  coding  sys¬ 
tem  that  consists  of  five  specialized  functional  blodcs: 
An  octave-based  subband  decomposition  signal  trans¬ 
former  [1,2],  a  bank  of  adaptive  quantizers  assisted  by 
a  bit  allocator,  and  a  lossless  compressor  coupled  with 
a  buffer. 


Figure  1.  High-Fidelity  Audio  Compression  System 

This  paper  concerns  fast  adaptive  algorithms 
for  the  lossless  compression  stage.  While  the  tech¬ 
niques  described  herein  can  be  applied  to  any  type 
of  data,  we  specifically  seek  efficient  compression  of 
digital  audio.  We  also  give  results  from  entropy  esti¬ 
mation  experiments  for  a  digital  audio  source. 

A  variable-to-variable  length  algorithm  for 
lossless  compression  is  presented.  This  algorithm  con¬ 
sists  of  two  stages:  A  variable-to-fixed  length  coder 
that  is  based  on  tree-building  algorithms  in  the  style 
of  Lempel-Ziv,  followed  by  a  fixed-to-variable  length 
arithmetic  coder  [3].  The  first  stage  parses  source 
symbols  into  a  fixed  number  of  frequently  occurring 
strings.  This  dictionary  of  strings  varies  over  time, 
and  an  adaptive  procedure  that  tracks  the  source's 
behavior  is  outlined  in  the  talk.  An  arithmetic  coder 
takes  the  labels  of  these  strings,  and  their  associated 
frequencies  of  occurrence  from  the  dictionary  coder, 
and  generating  a  compressed,  variable  length,  repre¬ 
sentation.  We  compare  the  performance  of  this  strat¬ 
egy  against  that  of  a  variable-to-fixed  coder  working 
alone,  and  that  of  a  fixed-to-variable  working  alone. 
While  variable-to-fixed  coders  are  effective  at  exploit¬ 
ing  correlation  between  source  outputs,  they  are  lim¬ 
ited  in  practice  by  limitations  on  data  structure  size  (fi¬ 
nite  table  length.)  Limitations  on  compression  caused 
by  this  phenomenon  are  obviated  by  the  introduction 
of  the  arithmetic  coder.  In  a  sense,  the  structure  that 
we  propose  resembles  the  paradigm  of  a  statistical 
coder  (the  arithmetic  coder)  coupled  with  an  adaptive 


source  modeler.  Similar  structures  that  use  Dynamic 
Markov  Modeling  for  the  source  modeler  have  been 
proposed  by  [3,4].  These  schemes  are  hampered  by 
the  fact  that  the  adaptation  algorithms  for  develop¬ 
ing  the  source  model  can  cause  it  to  grow  imman- 
ageably.  While  the  variable-to-fixed  length  front-end 
of  our  system  implicitly  models  the  sorirce,  the  data 
structure  evolves  to  accommodate  source  variations 
in  a  bounded  fashion.  Results  of  our  technique  ap¬ 
plied  to  digital  audio  requantized  via  ADPCM  quan¬ 
tizer  banks  discussed  above  are  provided. 

Further,  we  provide  results  of  entropy  estima¬ 
tion  experiments  performed  on  a  digital  audio  soiuce. 
These  were  performed  to  better  understand  the  be¬ 
havior  of  these  sources.  Our  estimator  is  based  on 
an  extension  of  the  techniques  proposed  in  [5].  This 
work  forms  the  basis  for  designing  the  variable-to- 
fixed  coder  described  above.  The  entropy  estimates 
are  compared  to  entropy  estimates  calculated  using 
marginal  probabilities  from  the  same  source  output 
In  short,  we  illustrate  an  efficient  compression 
strategy  that  allows  us  to  pick  up  extra  compression 
gain  in  our  audio  compression  system.  This  strategy 
is  adaptive  and  dynamically  changes  to  respond  to 
idiosyncracies  in  the  non-stationary  audio  source  that 
we  are  tracking. 
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The  error  probability  of  the  usual  linear 
modulations  (M-PSK  or  M-QAM)  in  the  Rayleigh  fading 
channel  (RFQ  varies  as  the  inverse  of  die  signal  to  noise 
ratio.  To  increase  the  slope  of  the  error  curve,  a  diversity 
technique  or  an  error  correcting  code  combined  with 
interleaving  can  be  added.  The  diversity  systems  are 
spectral  efficiency  costly  in  case  of  frequency  diversity, 
or  involve  additional  comidexity  if  multiple  antennas  ate 
used.  TteUis  Coded  Modulations  (TCM)  ate  an  efficient 
way  of  achieving  good  performance  without  spectral 
efficiency  loss.  However,  the  construction  of  well  suited 
TCM  codes  becmnes  a  very  difficult  task  when  M-QAM 
modulation  (M>16)  schemes  ate  used.  Here,  we  consider 
the  design  of  constellations  matched  to  the  RFC.  We 
search  for  n  dimensional  (n^)  lattices  which  can  provide 
a  diversity  of  order  n  in  the  RFC,  without  the  addition  of 
diversity  teclmiques  or  TCM. 

The  main  features  of  our  apptoadi  are  very  much 
the  same  as  for  the  AWGN  channel;  we  first  determine  a 
metric  measuring  channel  symbols  insulation  as  Euclidean 
distance  does  in  the  AWGN  channel.  A  careful 
appreciation  of  what  a  constellation  matched  to  a  dtannel 
means  leads  us  to  define  a  theoretical  frame  for  die  lattice 
codiitg  problem :  instead  of  searching  pecking  lattices,  we 
look  for  "admissible"  lattices  with  respect  to  a  body 
qiedfic  of  the  ctmsideied  channel,  a  concept  derived  from 
foe  geometry  of  numbers  .  At  high  SNR,  the  distance 
function  of  the  RFC  simplifies  and  becomes 


^(x.y)  =  |A'(*-y)|  wfoere  A^(x)  =  J2x;;  we  address 

ial 

the  lattice  coding  problem  in  that  case.  The  oone^poDding 
body  is  homofoetic  to  S  =  {x  \  <f(x,0)  ^  l}and  we  look 

for  S-admissible  lattices.  The  distance  function  shows 
that  an  S-  admissible  lattice  should  not  possess  two 
vectors  that  have  foe  same  value  in  any  cmnprment;  tins 
feature  should  provide  an  n^  order  diversity.  Under 
certain  conditions  N  is  foe  algebraic  norm  of  some  real 
number  field  K  of  degree  n;  in  that  case,  the  embedding 
of  the  ring  of  algebraic  integers  of  K  in  R*  provides  an 
S-admissiUe  lattice.  Hence,  number  field  tiieory  enatde 
us  to  define  a  procedure  to  find  dense  n  dimen^nal 
lattices  matched  to  the  RFC  :  1)  find  a  totally  real 
algebraic  number  field  K  of  degree  n  with  a  small 
absolute  discriminant;  2)  determine  an  integer  basis  of 
K.  The  densest  n  dimensional  lattices  using  this 
technique  are  known  when  2  ^  n  ^  8.  Hence,  we  have 
ready  a  family  of  n  dimensional  lattices  which  provide  a 
n^-order  diversity  in  the  RFC.  For  a  normalised  rale  of 
p  =  2  bits/dim,  the  gain  is  in  the  range  of  10  to  IS  dB,  at 

.a 

a  symbol  error  rate  of  10  ,  compared  to  16-QAM. 
Besides,  p  can  be  easily  increased  as  any  substt  of  an  S- 
admissible  lattice  match  the  RFC.  Fmally  we  address 
detection;  foe  maximum  likelihood  decoder  selects  foe 
channel  symbol  minimising  the  channel  metric.  A 
detection  algorithm  is  presented  which  provides  the  same 
performance  as  foe  exhaustive  search. 
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The  error  performance  of  a  trellis  modulation  (or  TCM)  code 
over  the  Rayleigh  fading  channel  depends  strongly  on  the  minimiim 
symbol  and  product  distances  of  the  code[l-2].  Both  these  distances 
should  be  as  large  as  possible,  and  they  play  different  roles  in  deter¬ 
mining  the  error  performance  of  the  code.  At  low  SNR,  the  minimum 
product  distance  is  more  important;  whereas  at  high  SNR,  the  min¬ 
imum  symbol  distance  becomes  more  important.  Apart  from  these 
two  distance  parameters,  the  path  multiplicity  (or  error  coefficient) 
is  also  an  important  factor  in  determining  the  error  performance  of 
a  code  at  low  SNR,  and  it  should  be  kept  as  small  as  possible. 

The  multilevel  coding  method  devised  by  Imai  and  Hirakawa[3] 
is  a  powerful  technique  for  constructing  bandwidth  efficient  modu¬ 
lation  codes  from  Hamming  distance  component  codes  in  conjunc¬ 
tion  with  proper  bits-to-signal  mapping  through  set  partitioning[4]. 
This  method  has  been  used  to  construct  both  trellis  and  block  mod¬ 
ulation  codes  for  the  AWGN  channel.  In  this  paper,  the  multilevd 
coding  method  is  used  to  construct  multilevel  multi-dimensional  trel¬ 
lis  MPSK  modulation  codes  for  the  Rayleigh  fading  channel.  This 
method  allowed  us  to  coordinate  all  the  important  parameters  of  a 
code  such  that  no  single  parameter  severely  degrades  the  performance 
of  the  code.  A  specific  construction  method  is  proposed.  In  this 
method,  the  minimum  symbol  and  product  distances  of  a  mnltilevd 
trellis  MPSK  code  are  expressed  in  terms  of  the  minimnin  Hamming 
distances  of  its  component  codes  and  the  intra-set  distances  of  the 
signal  constellation  and  its  subspaces.  In  the  construction  of  a  code, 
all  the  factors  which  affect  the  code  performance  and  its  decoding 
complexity  are  considered.  Good  codes  have  been  constructed.  The 
error  performances  of  some  of  these  codes  based  on  both  one-stage 
optimum  decoding  and  multi-stage  suboptimum  decoding  have  been 
simulated.  The  simulation  results  show  that  these  codes  ai^eve  good 
error  performance  with  small  decoding  complexity. 

As  an  example,  consider  the  two-level  coding  scheme  in  which 
the  first  component  code  is  a  two-state  four-dimensional  binary  trel¬ 
lis  code  and  the  second  component  code  is  a  4-state  eight-dimensional 
QPSK  trellis  code.  To  construct  the  first  component  code,  the 
single-parity-check  (4,  3,  2)  code  is  partitioned  into  4  cosets  by 
the  repetition  (4,  1,  4)  code  as  follows:  Ao  =  (0000, 1111},  A|  = 
{1100, 0011),  Aj  =  {1010, 0101),  and  As  =  {0110,1001}.  The  inter¬ 
act  distance  between  A;  and  Ay  (i^  j)  is  2,  and  the  intra-set  distance 
of  Ai  (i  =  0,1, 2, 3)  is  4.  Using  a  two-state  trellis  code  with  trellis 
structure  as  shown  in  Figure  1,  the  resultant  code  has  minimum  Ham¬ 
ming  distance  4.  The  second  component  code  is  formed  as  follows: 
Use  a  one-to-one  mapping  from  QPSK  signal  set  to  GF(4).  A  four- 
dimensional  set  partition  chain  can  be  obtained  by  using  extended 
Reed-Solomon  codes.  Let  R5(n,  d)  denote  a  Reed-Solomon  code  with 
block  length  n  and  minimum  Hamming  distance  d.  Form  a  set  par¬ 
tition  chain,  R5(4, 1)/JI5(4,2)/  R5(4,3)/R5(4,4)/{0},  where  0  is 
the  all-zero  vector.  A  linear,  paitial-unit-memory  four-state  trellis 
code  over  GF(4)  can  be  obtained  as  shown  in  Figure  2  where  D 
denote  a  buffer  with  an  unit-time  delay.  This  code  has  Hamming 
distance  3.  Combining  the  above  two  trellis  codes,  we  obtain  an 
S-state  eight-dimensional  trellis  8-PSK  code  with  minimum  symbol 
distance  3,  minimum  product  distance  8,  and  information  rate  2 
bits/symbol.  Each  state  transition  has  32  parallel  branches.  Figure 
3  shows  the  simulation  results  on  bit  error  performance  of  the  code. 
The  performance  of  the  code  is  better  than  the  S-state  Ungerboeck 
code  at  EifNo  >  13  dB.  The  normalized  branch  complexity  of  this 
code  is  only  half  that  of  the  8-state  Ungerboek  code.  In  Figure  3,  we 
also  include  the  simulation  results  of  Divsalar  and  Simon’s  4-atate 
four-dimensional  code  with  R=2.0  bits/symbol[2].  It  turns  out  that 
the  performance  of  Divsalar  and  Simon’s  code  is  worse  than  that  of 
8-state  Ungerboeck ’s  code  although  its  symbol  distance  is  not  less 
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than  that  of  Ungerboeck  code.  This  is  because  the  Ungerboeck  code 
has  a  better  distance  spectrum. 
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Figure  1.  The  trellis  structure  of  u  2-stute  »te-l/2  treOis  code 


Figure  2.  Aa  encoder  of  u  4-stute  coueolutiosul  code  over  GF(4) 


Figure  3  The  siinalatiaa  results  of  the  BER  petfcnauace:  Curves  1,  2,  sad 
3  are  the  performsaces  of  the  4-stutc  Divsular  sad  Shnoa's  code, 
the  8-stste  Ungerboeck  code,  sad  the  example  code  respectively. 
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Abstract 

A  new  scheme  is  presented  to  improve  the  error  performance  of 
modulation  and  trellis  coded  modulation  on  slow  fading  channels. 
This  technique  consists  in  permuting  coordinates  of  multidimension¬ 
al  amstdlations  on  interleaved  channels.  This  permutation  insures 
that  each  coordinate  of  the  signal  is  faded  independently. 
Permutation  alone  does  not  insure  good  improvements,  the  choice  of 
a  particular  rotation  of  the  signal  set  is  also  critical  to  obtain  perfor¬ 
mance  gains.  A  number  of  uncoded  and  coded  modulation  schemes 
have  b^  found  to  be  improved  with  permutation.  Theoretical  and 
simulation  results  show  that  this  simple  permutation  provides  gains 
of  up  to  4  dB  with  uncoded  modulation.  For  trellis  coded  m^ula- 
tion,  gains  5  dB  toere  achieved  for  a  64  state  rate  3/4  convolu¬ 
tional  code  and  16  QAM  modulation. 

Summary 

In  this  paper,  we  present  a  scheme  that  improves  the 
error  performances  for  slow  fading  channels.  This  technique 
is  used  with  interleaving,  it  consists  in  permuting  each  coordi¬ 
nate  of  the  transmitted  signals.  This  permutation  may  be 
viewed  as  an  interleaving  at  the  level  of  individual  coordi¬ 
nates.  It  improves  the  error  performances  by  transmitting 
signals  such  that  they  are  not  completely  deteriorated  by 
fades. 

Permutation  of  coordinates  may  be  used  with  or  with¬ 
out  coding.  It  is  simply  a  different  l^d  of  interleaving  that 
provides  up  to  5  dB  gains  as  compared  to  usual  interleaving. 
This  tedmique  has  b^n  explored  both  theoretically  and  witti 
computer  simulations.  It  is  shown  that  particular  rotations  of 
the  signal  set  give  good  performances.  Hence  some  care  must 
be  tal^  when  choosing  a  coded  or  uncoded  modulation  tech¬ 
nique  when  a  permutation  of  coordinates  is  used. 

I.  Permutation  of  Coordinates  for  Uncoded  ModulaHon 

The  technique  under  study  is  most  easily  described  as 
an  interleaving  of  the  individu^  coordinates  of  subsequent 
signals  to  be  transmitted.  If,  for  example,  we  have  signals 
(>^,Y,),  (Xj,  Yj),  (Xj,  Yj),  (X4,  Y4),  taken  in  a  constellation  of  two 
dimensions.  After  interleaving,  the  signals  actually  being  sent 
may  be  (X,,  Xj),  (X3,  X4),  (Y„  Yj),  (Y^  Y4).  Once  received,  the 
new  signals  are  corrupted  by  fades  and  noise,  the  coordinates 
are  then  reordered  before  the  usual  decisions  are  taken.  If  the 
interleaving  is  such  that  each  coordinates  of  a  pair  (X^  Y^  are 
faded  independently,  the  error  performance  may  be  signifi¬ 
cantly  improved.  If  fades  are  independent  on  each  coordinate, 
the  average  error  performance  is  improved  since  averaging  is 
carried  out  with  two  independent  variables  instead  of  one.  In 
this  paper,  two-dimensional  constellations  are  used  through¬ 
out  but  the  scheme  is  easily  applicable  to  other  multidimen¬ 
sional  constellations. 

The  usefulness  of  permutation  is  now  presented  by 
deriving  bounds  on  the  error  performance  with  and  without 
permutation.  Fbr  high  average  signal-to-noise  ratios,  y„,  the 
error  probabilities  of  16  QAM  signals  transmitted  over  a  Ray¬ 
leigh  fading  channel  are  respectively  approximated  as  ; 


N 

f 

1- 

T. 

(1)  Pe  •  1 

1- 

y. 

l  N 

W  ♦  T. 

8 

1  N 

where  eq.(l)  holds  without  permutation  and  eq.(2)  is  used  for 
indepedent  fades  on  each  coordinate  (i.e.  permuted  coordi¬ 
nates)  and  a  45°  rotation  of  the  signal  set,  as  in  Hgure  1  b). 
As  shown  on  Figure  1,  this  rotation  improves  the  error  proba¬ 
bilities  by  increasing  the  distances  between  received  faded 
signals.  By  comparing  both  equations,  we  find  that  more  Bum 
4dB  is  gained  with  a  permutation  of  coodinates.  The  same 
technique  may  be  used  for  8  FSK  modulation,  and  the  energy 
gain  provided  by  the  permutation  is  now  greater  than  3  dB. 
It  is  to  be  noted  that  a  rotated  16  QAM  signal  set,  once  per¬ 
muted  gives  a  49  QAM  transmitted  signal.  Hence,  the  pe^r- 
mance  improvement  may  also  be  viewed  as  a  consequence  of 
an  increase  in  the  complexity  of  the  signal  set. 

II.  TCM  with  Permutation  of  Co«»rdinate« 

With  Trellis  Coded  Modulation  fTCM),  the  same  tech¬ 
nique  may  be  used  to  improve  performances.  Many  64  state 
TCM  schemes  [1]  were  Simula^  for  fully  interleaved  slow 
Rayleigh  fading  channels.  Two  results  are  of  particular  inter¬ 
est,  at  a  BER  of  10^,  the  permutation  of  coordinates  for  the 
rate  2/3, 8  PSK,  TCM  provides  an  additional  1.8  dB  gain  and 
for  the  rate  3/4,  16  QAM,  TCM  a  5  dB  improvement  is  ob¬ 
tained.  For  the  rate  3/4, 16  QAM  TCM  scheme  the  best  gain 
is  obtained  without  rotation.  Note  that  without  coding,  a 
rotation  of  the  signal  set  was  necessary  to  improve  die  aver¬ 
age  intersignal  distance.  It  is  quite  surprising  that  such  a 
rotation  is  not  needed  with  coding.  This  may  be  explained  by 
an  inappropriate  mapping  with  rotated  signals,  but  other 
mappings  were  simul.ited  without  improving  these  perfor¬ 
mances.  The  16  QAM  performances  are  particularly  notewor¬ 
thy,  since  permutation  without  rotation  does  not  change  the 
transmitted  signal  constellation.  Hence  by  simply  interleaving 
the  coordinates,  improvements  of  5  dB  are  possible,  without 
changing  other  aspects  of  the  transmission  link. 

[1]  G.  Ungerboeck,  "Channel  Coding  with  Multilevel/ 

Phase  Signals",  IEEE  Trans,  on  Inf.  Th.,  IT-28,  January 

1982. 
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Abstract 

This  paper  analyzes  performance  limits  of  coded 
multilevel  differential  PSK  (MDPSK)  in  frequency 
selective  Rayleigh  fading  channels.  It  is  assumed  either 
that  interleaving  degree  is  large  enough,  or  that  there 
is  a  sufficient  bandwidth  for  frequency  hopping,  to 
randomize  the  burst  errors  produced  by  fading.  The 
channel  cutoff  rate  of  MDPSK  is  calculated  based  on  the 
“Gaussian  metric”;  AWGN,  co-channel  interference  and 
multipath  channel  delay  spread  are  taken  into  account. 
For  practical,  reliable  communications  over  cellular 
mobile  radio  systems  employing  coded  MDPSK,  the 
three  optimal  information  bit  rates  that  achieve 

Dminimum  required  average  signal  energy  per 
information  bit-to-noise  power  spectral  density  ratio 
(Eb/No), 

2) maximum  tolerable  rms  delay  spread  Xrmsi 

3) maximum  spectrum  efficiency 

are  determined  from  the  channel  cutoff  rate.  It  is  shown 
that  without  fading  frequency  selectivity,  the  optimal 
information  bit  rate  (  =  information  bits  /MDPSK 
symbol)  which  minimizes  the  required  average  Eb/No  is 
around  0.25  information  bits  /symbol  for  2DPSK,  and 
0.4  bits  /symbol  for  other  MDPSK  schemes  with  MS  4. 

In  frequency  selective  fading,  the  lower  the  rate  of 
codes  for  error  correction,  the  higher  the  channel 
symbol  rate  for  a  given  information  bit  rate  1/Tb,  and 


the  transmission  performance  becomes  more  sensitive 
to  the  fading  frequency  selectivity.  It  is  shown  that  a 
larger  XrQ,s/^b  value  can  be  tolerated  with  larger  values 
of  M.  The  optimal  code  rate  for  32DPSK  is  around  0.3 
(1.5  information  bits  /symbol),  and  the  maximum 
Xrms/Tb  value  is  1.5. 

In  co-channel  interference  environments,  it  is 
obvious  that  a  larger  error  correction  capability  reduces 
required  average  signal-to-interference  power  ratio 
(SIR).  Therefore,  the  same  frequency  can  be  used  in 
closer  cells  when  lower  rate  codes  are  used.  This 
increases  the  system  spectrum  efficiency.  However,  the 
lower  rate  codes  require  larger  transmission 
bandwidth,  and  this  decreases  the  eHlciency. 

In  the  analysis  of  the  spectrum  efficiency  of  cellular 
mobile  radio  systems  employing  coded  MDPSK,  the 
service  area  is  defined  as  the  area  in  which  practical, 
reliable  communications  are  possible.  It  is  shown  that 
for  a  given  channel  bandwidth,  the  spectrum 
efficiencies  are  maximized  when  the  information  bit 
rate  is  around  0.5  information  bits  /symbol  for  2DPSK, 
1  bit  /symbol  for  4DPSK,  and  1.4  bits  /symbol  for  other 
MDPSK  schemes.  For  a  given  information  bit  rate, 
spectrum  efficiency  is  increased  with  larger  M  values. 
The  optimal  code  rate  for  32DPSK  is  around  0.3,  and 
the  maximal  spectrum  efficiency  is  2.6  times  as  large  as 
that  for  1  bit-per-symbol  coded  4DPSK. 
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Abitract 

In  this  paper  an  upper  bound  on  bit  error  probability  of  a  maximum- 
lifcelihood  sequence  estimation  (MLSE)  equalizer  for  trellis-eoded  modu¬ 
lation  (TCM)  systems  with  diversity  reception  is  derived  for  multipath 
Raylei^  fading  channels.  Analytical  and  simulation  results  show  that  our 
bound  is  good  for  all  cases  considered  especially  when  diversity  reception 
is  used. 


Summary 

Sequence  estimation  and  trellis-coded  modulation  are  effective  means 
to  combat  channel  impairments  such  as  intersymbol  interference  (ISI)  and 
channel  noise.  It  is  known  that  MLSE  joint  with  decoding  in  the  maximum- 
likelihood  sense  is  the  optimum  way  to  detect  TCM  signals  on  ISI  channels. 
On  frequency-selective  fading  channels,  the  performance  of  MLSE  for  TCM 
systems  were  analyzed  in  [1]  [2]  [3]  under  the  assumption  that  fading  it 
so  slow  that  the  channel  remains  Axed  during  an  entire  error  event.  In 
reality,  the  fade  speed  of  the  channel  is  closely  related  to  the  autocorrelation 
function  of  the  time-variant  channel  impulse  response.  In  this  paper,  a 
general  upper  bound  on  bit  error  probability  of  MLSE  for  TCM  systems 
on  multipath  Rayleigh  fading  channels  is  derived. 

We  assume  that  diversity  reception  is  available,  so  the  whole  channel 
is  modeled  as  D  independent  fading  channels  corrupted  by  i.i.d.  complex 
white  Gaussian  noise.  Let  0<i<L,  l<d<D,  denote  the  tth  tap 
coefficient  of  the  dth  diversity  branch  at  time  t  in  the  equivalent  discrete¬ 
time  channel  model..  Let  zt  be  the  output  of  the  TCM  encoder  at  time 
k  and  yf  be  the  corresponding  output  of  the  channel  at  the  dth  diversity 
branch.  Then 

L 

=  1^  **-«»..+ ’It- 

■sO 

where  (qjl)  are  i.i.d.  zero-mean  complex  Gaussian  random  variables  with 
variance  =  (l/2)E{l’lil’}  =  ^o-  Since  the  channel  is  assumed  to  be 

Rayleigh  faded,  tap  coefficients  (yt.i)  modeled  as  independent  (in  terms 
of  indices  t  and  d)  zero-mean  complex  Gaussian  random  variables  (but 
which  ate  correlated  in  index  k.) 

Let  V  =  {vs}  be  the  transmitted  information  sequence  and  v®e  =  {vs® 
es)  be  the  information  sequence  at  the  receiver  output.  Since  {xs)  is  the 
transmitted  signal  sequence  of  the  information  sequence  {vs},  let  {xs  -f-fs) 
denote  the  corresponding  signal  sequence  of  {vs  ®  es).  By  employing  the 
union-bounding  technique,  the  average  bit  error  probability  for  an  MLSE 
receiver  can  be  bounded  by 


and 


Cs-i  A 
-£s-i  ; 


£t-i  I 
es'i  R 


1 


Since  in  general  the  matrix  A'  may  be  singular,  let,  without  Ion  of  gen¬ 
erality,  the  first  U  rows  ate  independent  and  As  =  ot.mAJn,  for 

i  =  M  -f  1, . .  .,iN,  where  A^  is  the  kth  row  of  A'.  Let  A  be  the  resulting 
M  X  M  submatrix  ot  A'  by  deleting  dependent  rows  and  columns.  We  can 
show  that  the  pairwise  error  probability  can  be  Chemoff-bounded  by 


P{r(v  ®  e)  >  r(v)|v}  < 


I  [  Pm  Y 

2[2*^/»|A|'^*J  ’ 


where  Bit  can  be  computed  from  the  following  recursive  relation: 
Initialisation: 

+  sib;I2sssr+i 

For(t  =  2,3,...,‘'M){ 

For(«  =  t . M-,j  =  k,...,M){ 

_/fc  IV 


} 

The  recursive  formula  is  simple  and  can  be  easily  carried  out  in  a  com¬ 
puter.  The  final  upper  bound  on  bit  error  probability  is  computed  from 
the  S3rstem’s  error-state  diagram  and  by  a  stack  algorithm  similar  to  that 
in  (I). 

In  the  following  an  example  is  given  for  an  4-state  8-PSK  TCM  scheme 
on  a  two-tap  fading  channel.  The  autocorrelation  function  of  tap  coefficients 
is  modeled  as 


2*“  =  I 


0, 

(r?*(2»/,mr), 


i  =  y. 


Analytical  and  simulation  results  are  shown  in  Figure  1,  where  four  nor¬ 
malized  fade  rates  foT  —  0.005,0.03,0.05,0.08  are  considered.  From  the 
results  shown,  our  bound  is  good  for  all  the  cases  considered  especially 
when  diversity  reception  is  available.  It  accurately  predicts  the  tendency 
of  the  performance  curves. 
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ft  <  -  13  W.(e)53;P{y}P{r(y®e)  >  r(y)|y}, 

"eeE  » 

where  E  is  the  set  of  all  error  events,  Wk(e)  is  the  number  of  bit  errors 
of  the  error  sequence  e,  P{v}  is  the  prior  probability  of  the  transmitted 
information  sequence  y,  and  r(y)  is  the  path  metric  of  y.  The  pairwise 
error  probability  should  be  averaged  over  the  fading  characteristic,  which 
gives 

P{r(y®«)  >  r(v)|y)  =  j  P{r(v®e)>  r(y)|v,g)p(g')  -  jKg^ldg*  •  dg". 

We  assume  perfect  channel  estimate  and  the  same  fading  characteristics 
for  all  diversity  branches  in  our  derivation.  Let  «+;»».(  /.  where 

the  supscript  d  it  ignored  because  of  our  assumption,  and  £»  =  £s  a  +jei  /. 

For  an  error  event  of  length  N,  let  the  covariance  matrix  of  the  random 
sequence  (gt,t  h,  »i,(  . . iN.i  Ai  SN.i  i)  be  R<.  Consider  the  matrix 

t 

A'  =  53ArRA(, 

where 
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Figure  1:  Analytical  and  simulation  results  for  a  16-atate  8-PSK  MLSE  on 
multipath  Rayleigh  fading  channels  with  /pT  =  0.005,  0.03,  0.05,  0.08. 
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1 . Introduction 

In  the  Maxiaua- 1  ikei ihood  decoding  under  a  non- 
Gaussian  noise,  the  decoding  region  is  bounded  by 
coaplex  curves  instead  of  a  perpendicular  bisector 
corresponding  to  the  Gaussian  noise.  Therefore,  the 
error  rate  is  not  evaluated  by  the  Euclidean  distance. 
The  Bhattacharyya  distance  is  adopted  since  it  can 
evaluate  the  error  perforaance  for  a  noise  with  an 
arbitrary  distribution. 

Upper  bound  foraulae  of  a  bit  error  rate  and  an 
event  error  rate  are  obtained  based  on  the  error- 
weight-profile.  Using  the  bound,  optiaua  code  is 
searched. 

2.  Svstea  Model 

Figure  2  shows  the  systea  aodel  discussed  here.  An 
Inforaation  bit  streaa  is  fed  to  a  linear  convolutional 
encoder  of  rate  2/3  and  aapped  to  the  8-AM  signal  of 
equal  signal-spacing  by  natural  aapping.  The  received 
signal,  disturbed  by  a  non-Gaussian  noise,  is  fed  to  a 
viterbi-decoder  for  aaxiaua- likelihood  decoding.  The 
decoder  is  assuaed  to  know  the  probability  density  of 
the  noise. 

3.  Upper  bound  of  error  rate 

The  Bhattacharyya  distance,  BD(A,B),  between  signal 
point  A  and  B  is  given  by 

BD(A,B)=-ln[ S  {Pn(x-x*)  Pn(x-XB)*^^  dx] 

whole  space 

where  Pn(x)  is  the  probability  density  of  the  receiving 
noise.  The  Union  bound  of  event  or  bit  error  rate  is 
estlaated  based  on  the  error-weight-prof 1 le  nethod[l] 
by  using  the  Bhattacharyya  distance  instead  of  the 
squared  Euclidean  distance. 

4.  Qptiaua  code  search 

To  deteralne  the  optiaua  code  for  an  lapulsivc  noise 
channel,  the  upper  bound  of  the  bit  error  rate  is 
calculated  for  each  code  having  an  encoder  with  given 
shift-register  length.  The  best  code  is  selected  as 
that  having  the  ainiaua  upper  bound  of  the  bit  error 
rate. 

To  lighten  the  coaputation  burden,  a  suboptiaua 
search  is  also  atteapted.  In  the  suboptiaua  search, 
the  candidate  codes  having  the  aaxiaua  free 
Bhattacharyya  distance  codes  is  searched  at  the  first 
stage,  in  the  next  stage,  the  upper  bound  of  bit  error 
rate  is  calculated  aaong  the  candidate  codes  and  the 
suboptiaua  code  is  deterained. 

5.  Results  of  code  search 

Figure  2  shows  the  probability  density  of  the 
iapulsive  noise,  aodeled  froa  an  observation  in  digital 
subscriber  loops.  For  the  noise,  the  optiaua  or 
suboptiaua  code  is  searched  for  aaong  codes  having 
Massey  type  encoders  with  a  shi ft -register  of  up  to  4 
bits.  Figure  3  shows  the  relation  between  the 
upper  bound  and  the  siaulation  result.  Figure  4  shows 
the  result  of  BER.  By  using  the  suboptiaua  code  with  a 
4-bit  encoder,  a  coding  gain  of  20dB  is  obtained  at 
the  bit  error  rate  10“®.  It  is  lldB  aore  than  that 
obtained  by  Ungerboeck’s  code.  The  detailed  result  is 
reported[2j. 

[1]  E.Zehavi  and  J. Wolf, IEEE  Trans. IT, IT-33,  ppl96- 
202(1987).  (2)  H.Ogiwara  and  H.Irie,  lEICE  Trans. ,E75- 
A,ppl063-1070(1992). 
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Fig.  1 
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Fig.  2  Probability  distribution 
of  an  iapnlsive  noise  and  the 
Gaussian  noise  of  the  sane 
variance. 


Error  rote(Upper  bound) 

Pig.  3  Relation  between  the  upper 
bound  and  siaulation  results. 
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Fig.  4  Bit  error  rate  by  siaula- 
tion  of  optiaua  (encoder  shift 
register  length  y  =2.3)  or  sub- 
optiaua  (y=4)  codes  and  bit 
error  rate  without  coding. 

"UB  code"  is  the  code  found  by 
Ungerboeck.  □.■:y=2.  O  ,  0  :  y 
=  3  ,  A  .  A  :  V  =  4  . 
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In  this  paper,  we  propose  a  modified 
sysdiol-rate-lncreased  TCM  for  digital  micro- 
wave  radio.  The  original  symbol-rate-increased 
TCM  accomplishes  coding  redundancy  through 
bandwidth  expansion.  Instead  of  through 
signal  point  expansion.  In  order  to  obtain  a 
greater  coding  gain  than  the  Ungerboeck-type 
TCM.  A  drawback  of  this  scheme  Is  that  It 
requires  the  bandwidth  to  be  expanded  by  a 
fixed  factor  m/(ffl-l}  for  a  2''qam  system. 

The  proposed  scheme  permits  the  setting  of 
the  bandwidth  expansion  ratio  to  an  arbitrary 
value  smaller  than  m/(m-l).  The  simulation 
results  clarified  that  the  proposed  scheme 
can  set  lower  bandwidth  expansion  ratio  than 
the  symbol-rate-increased  TCM  with  only 
slightly  reduced  coding  gain. 

Error  Control  Schemes  for  DMR  Systems 

Quadrature  Amplitude  Modulation (QAM)  is 
widely  used  in  Digital  Microwave  Radio  (DMR) 
systems.  Error  control  Is  Indispensable  for 
realizing  highly  reliable  multilevel  QAM  sys¬ 
tems,  particularly  when  the  number  of  constel¬ 
lation  points  Is  large.  Two  principal  error 
control  schemes  for  DMR  systems  are  block 
coding  and  Trellis  Coded  Modulation  (TCM) . 
Block  coding,  such  as  BCH  code  or  Reed-Solomon 
code,  can  be  easily  adopted  to  DMR(l),  because 
It  allows  the  bandwidth  expansion  ratio  to  be 
set  to  an  arbitrary  value.  Though  block  coding 
Is  utilized  In  many  commercial  systems,  the 
coding  gain  Is  rather  small  (2-3  dB  at  BBR>>10~^). 


Data  rata :  R(bp8)  each 


f-  H(bps)  each  i  R(bp,)  each 

Fig.  1  Symbokate-increaaed  TC-2S6  QAM 


Data  rata  :  R(bps)  aach 


^R(bps)  ^R(bps)asch 

Fig.  2  Proposad  256  QAM  sehaine 


Salto  proposed  a  new  scheme,  named  8ynd>ol- 
rate-lncreased  TCM,  for  DMR  systems  1 2).  It 
accomplishes  coding  redundancy  through  band¬ 
width  expansion.  Instead  of  through  signal- 
set  expansion,  and  achelves  a  remarkable 
coding  gain  (greater  than  5  dB  at  BER>10  )  . 
A  drawback  of  this  scheme  Is  that  It  requires 
the  bandwidth  to  be  expanded  by  a  fixed  factor 
m/(m-l)  for  2''qam.  For  example,  the  bandwidth 
expansion  ratio  for  the  symbol-rate-increased 
TC-256QAM  (Figure  1)  is  8/7. 

Modified  Symbol-Rate-lncreased  TCM  Scheme 
The  scheme  proposed  In  this  paper  permits 
the  setting  of  the  bandwidth  expansion  ratio 
to  an  arbitrary  value  smaller  than  m/(m-l). 
In  the  scheme,  an  m-blt  parallel  Input  data 
sequence  Is  converted  Into  m.-blt  and  m_-blt 
parallel  sequences  with  different  data  rates, 
and  then  the  m,-blt  sequence  Is  encoded  by  a 
trellis  encoder  whose  coding  rate  Is  r.  Then 
the  overall  bandwidth  expansion  ratio  Is 
m/((m-m.)r't’m_) .  It  Is  smaller  than  m/(m-l), 
if  r  is  greater  than  (m-m2-l)/(i»-a2)  •  As  an 
example,  we  have  designed  a^  sclieme  for  256QAM 
whose  bandwidth  expansion  ratio  Is  16/15 
(Figure  2) . 


Bit  Error  Rate  performance 
Figure  3  shows  the  bit  error  rate  perform¬ 
ance  for  three  different  schemes:  the  symbol- 
rate-increased  TC-256QAM,  the  proposed  256QAM 
scheme  with  a  bandwidth  expansion  ratio  of 
16/15,  and  a  practical  256QAM  scheme  with  a 
(255,239)  BCH  code.  The  number  of  the  encoder 
states  Is  32.  The  coding  gain  for  the  symbol- 
rate-  increased  TCM  and  that  for  the  proposed 
scheme  are  5 . 1  _dB  and  4.3  dB,  respectively, 
at  a  BER  of  10~^.  Though  the  figure  for  the 
proposed  scheme  Is  slightly  smaller  than  that 
obtainable  with  the  original  symbol-rate- 
increased  TCM,  it  shows  that  the  proposed 


scheme  can  attain 
a  remarkable  Im¬ 
provement  over  a 
BCH  coded  256QAM 
with  the  same  band¬ 
width  expansion 
ratio. 
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ABSTRACT 

Multi-h  partial  response  CPM  signals  are  analyzed  at  pK- 
selected  number  of  states  in  contrast  to  the  standard  method 
of  analyzing  them  at  preselected  number  of  modulation  in¬ 
dices.  Optimal  multi-h  phase  codes  which  produce  the  high¬ 
est  minimum  Euclidean  distance  are  presented  according  to 
the  number  of  states.  Three  orders  of  multi-h  signaling; 
2-h,  3-h,  and  4-h  signaling  are  considered  in  the  optimal 
code  search. 

1.  INTRODUCTION 

In  the  literature  multi-h  phase  codes  have  been  exten¬ 
sively  considered  with  continuous  phase  modulation  (CPM) 
with  regard  to  their  performance,  spectral  properties,  detec¬ 
tion  etc  [1-4].  It  is  known  that  the  performance  of  CPM 
signaling  can  be  improved  by  combining  with  multi-h  phase 
codes  while  maintaining  the  properties  inherent  to  the  phase 
continuity  of  the  signals  [1-4]. 

A  k-h  CPM  signal  changes  its  modulation  index  cycli¬ 
cally  at  the  end  of  every  interval  over  k  values  (ho,  h],  . . .  , 
hk-i).  Binary  k-h  partial  response  signals  considered  in  this 
study  generally  take  the  form 


cos 

r  <  ^ 

A 

u!ct  +  2va„h„  1  g{a  -  nT)da  +  <I>q  ; 

J 

L  nT 

nT  <t  <{n  +  l)T. 

(1) 

The  error  rate  performance  of  any  wideband  CPM  signal¬ 
ing  system  is  usually  expressed  in  terms  of  the  minimum 
Euclidean  distance  of  the  signals.  In  fact,  the  asymptotic 
error  probability  variation  at  high  signal  to  noise  ratio  is 
approximately  given  by  [1], 

Pc  ^  q(V‘^,„£6/Ao).  (2) 

It  is  seen  from  (2)  that  the  performance  of  a  CPM  system 
can  be  improved  by  increasing  the  minimum  Euclidean 
distance.  The  minimum  Euclidean  distance  can  generally  be 
increased  by  increasing  the  constraint  length  of  the  signals. 
The  total  number  of  states  N'  is  the  product  of  the  number 
of  phase  states  N  and  the  number  of  symbol  states. 

The  complexity  of  a  multi-h  signaling  scheme  mainly 
depends  on  the  number  of  modulation  indices  (or  the  order 
of  signaling)  k,  the  number  of  states  bT,  and  the  receiver 
path  memory  length  Nr  [1.  2].  Among  them  the  number 
of  states  can  be  considered  as  the  most  significant  factor 
as  it  is  necessary  to  compute  2N'  number  of  metrics  at  the 
end  of  every  interval  for  trellis  decoding  [1].  Considering 
the  factors  which  determine  the  complexity,  it  is  more 
appropriate  to  analyze  multi-h  signals  at  preselected  values 
of  bT  in  situations  where  the  allowed  complexity  is  limited. 
Since  partial  response  signals  generally  have  better  spectral 
properties  than  full  response  signals,  the  results  presented 
here  are  more  beneficial  than  those  in  [2]  especially  when 
designing  bandlimited  systems. 


2.  EVALUATION  OF  OPTIMAL  MULTl-h  CODES 

The  evaluation  of  the  optimal  multi-h  codes  for  both 
2REC  and  2RC  signals  at  any  given  value  of  S'  (or  N) 
consists  of  the  evaluation  of  the  optimal  number  of  modu¬ 
lation  indices,  k  and  determination  of  the  set  of  k  optimal 
modulation  indices  (ho,  h),  . . .  ,  h^.i),  to  maximize  the 
minimum  Euclidean  distance.  For  any  selected  value  of  k, 
the  average  modulation  index  which  is  defined  as  the  mean 
value  of  set  of  modulation  indices  can  take  cnly  discrete 
values  which  are  determined  by  the  selection  of  the  set  of 
integers  of  the  numerate  of  modulation  indices. 

Since  the  maximization  of  the  constraint  length  does  not 
necessarily  maximize  the  minimum  Euclidean  distance,  the 
direct  maximization  of  the  minimum  Euclidean  distance  has 
to  be  carried  out  over  all  possible  values  of  k  to  evaluate 
the  optimal  codes.  In  order  to  reduce  the  complexity  of 
calculations  and  of  the  resulting  signaling  schemes,  only  2- 
h,  3-h  and  4-h  signals  with  k=2,  k=3  and  k=4  respectively, 
are  considered  in  the  optimal  code  search.  It  is  important 
to  note  that  even  though  the  set  of  possible  h^g.  values  <» 
the  range  0  <  iUvg.  <  1  differs  trim  one  value  to  the  other, 
the  best  value  of  k  and  the  corresponding  h,vg.  value  along 
with  the  optimal  combination  of  modulatirai  indices  which 
produces  the  highest  minimum  Euclidean  distance  can  be 
determined  in  range  0  <  havg.  <  1- 

At  any  given  value  of  k,  the  best  combination  of  mod¬ 
ulation  indices  at  each  possible  havg  value  is  numerically 
found  by  searching  over  all  possible  combinations  of  mod¬ 
ulation  indices.  Since  the  primary  objective  is  to  maximize 
the  minimum  Euclidean  distance,  all  special  combinations 
of  modulation  indices  are  also  considered.  These  special 
combinations  include  the  ones  with  any  number  of  equal 
modulation  indices  including  the  constant  h  signals.  Fur¬ 
ther,  in  all  of  the  minimum  Euclidean  distance  calculaticms 
the  minimization  is  carried  out  over  all  cyclic  shifts  of  the 
modulation  index  pattern  in  order  take  into  account  of  the 
paths  that  originate  at  the  beginning  of  all  signaling  inter¬ 
vals. 
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Machine  learning  is  often  modeled  as  the  process  of  extrap¬ 
olating  samples  of  a  function.  This  extrapolation  requires  both 
samples  and  ‘inductive  bias.”  Bias  towards  low  complexity,  as 
in  Occam’s  Razor,  is  particularly  important.  Kolmogorov  com¬ 
plexity  was  developed  to  formalize  this  proceM  [5].  Important 
theoretical  results  have  ako  been  developed  using  more  abstract 
measures  of  complexity  [4].  Therefore,  there  is  a  strong  theo¬ 
retical  basis  for  Occam-based  learning.  Kolmogorov  complexity 
is  a  general  measure,  which  would  allow  learning  of  many  dif¬ 
ferent  kinds  of  functions  (i.e.  robust  learning);  however,  it  has 
been  proven  that  its  exact  computation  is  not  tractable.  There 
have  been  some  tractable  measures  of  complexity  used  in  actual 
implementations  of  Occam-based  learning  [3],  such  as  the  Ab- 
ductory  Inference  Mechanism  (AIM).  However,  these  measures 
of  complexity  are  relatively  narrow,  which  implies  non-robust 
learning.  The  challenge  is  to  develop  robust  and  tractable  mea¬ 
sures  of  complexity. 

One  approach  to  this  challenge  is  called  pattern  theory  [6], 
where  we  think  of  robust  complejdty  determination  as  the  prob¬ 
lem  of  finding  a  pattern.  Pattern  theory  uses  Decomposed  Func¬ 
tion  Cardinality  (DFC),  proposed  by  Y.  S.  Abu-Mostafa  as  a 
general  measure  of  complexity  [1,  p.l28].  We  demonstrate  that 
this  measure  of  complenty  allows  robust  learning,  yet  is  suf¬ 
ficiently  tractable  to  support  the  learning  of  non-trivial  func¬ 
tions.  We  develop  support  for  the  generality  of  the  measure 
both  theoretically  and  experimentally.  Generality  is  supported 
theoretically  by  proving  its  relationship  to  the  conventional  mea¬ 
sures  of  circuit  complexity,  time  complexity  and  program  length. 
Generality  is  supported  experimentally  by  using  a  decomposi¬ 
tion  program  (referred  to  as  AFD)  derived  from  the  work  of 
R.  L.  Ashenhurst  [2]  and  others. 

The  experimental  work  includes  the  measurement  of  the  DFC 
of  a  large  variety  of  functions,  determining  the  correlation  of 
DFC  with  more  specialized  measures  within  that  domain  of  ap¬ 
plication,  and  machine  learning  experiments.  The  DFC  of  over 
800  non-randomly  generated  functions  was  measured,  includ¬ 
ing  many  kinds  of  functions  (numeric,  symbolic,  chaotic,  string- 
based,  graph-based,  images  (md  files).  Roughly  98  percent  of 
the  non-randomly  generated  functions  had  low  DFC- (versus  less 
than  1  percent  for  random  functions).  The  2  percent  that  did 
not  decompose  were  the  more  complex  of  the  non-randomly  gen¬ 
erated  functions  rather  than  some  class  of  low  complexity  that 
AFD  could  not  deal  with.  It  is  important  to  note  that  when 
AFD  says  the  DFC  is  low,  which  it  did  some  800  times,  it  also 
provides  an  algorithm  (or  a  description  of  the  pattern  found). 
AFD  found  the  classical  algorithms  for  a  number  of  functions. 

The  correlation  coefficient  between  DFC  and  a  ranking  of 
the  complexity  of  images  by  people  was  0.8.  The  correlation 
between  DFC  and  the  compression  factor  of  two  commercial 
data  compression  programs  wm  about  0.0.  The  correlation  be¬ 


tween  DFC  and  the  Lyapunov  exponent  for  logistic  functions 
was  0.0.  These  high  correlations  show  that  the  single  measure, 
DFC,  reflects  the  essential  structure  in  each,  very  different,  sit¬ 
uation.  A  fourth  correlation  experiment  found  no  corrdation 
between  people’s  ability  to  recognize  concepts  and  the  DFC  of 
those  concepts.  This  lack  of  correlation  may  be  a  result  of  the 
narrow  range  of  DFC’s  involved  (5%  of  its  full  range). 

In  learning  experiments,  AFD  did  as  well  as  a  back-propo- 
gation  trained  Neural  Network  (NN)  on  problems  well-suited  to 
NN’s.  However,  on  other  problems  such  as  parity,  AFD  learned 
a  256  point  function  from  50  samples  whereas  the  NN  required 
ail  256  points.  The  findings  were  similar  for  the  AIM  program. 
In  both  cases,  the  extrapolations  of  the  NN  and  AIM  were  not 
robust  while  AFD  consistently  learned  functions  of  low  complex¬ 
ity  with  few  samples. 

The  experiments  to  date  have  been  limited  to  small  (leas 
than  10  variables)  binary  functions.  The  results  on  these  small, 
but  non-trivial,  functions  have  consistently  pointed  to  a  promis¬ 
ing  ability  to  find  many  different  kinds  of  patterns.  Therefore, 
we  believe  these  results  are  the  first  demonstration  of  robust 
Occam-based  learning  and  help  join  an  important  body  of  the¬ 
oretical  results  with  practical  machine  learning. 
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Artificial  neural  network  training  algorithms,  based  upon 
gradient  search  minimizations  of  cost /loss  functions  when  the 
training  set  contains  many  high-dimensional  inputs,  are  time- 
consuming  and  can  only  address  the  issue  of  an  appropriate 
network  architecture  (network  topology)  either  through  re¬ 
peated  training  of  different  networks  or  the  addition  of  penalty 
functions  so  as  to  control  the  problem  of  ‘overfitting’  that  re¬ 
sults  from  ‘overtraining’,  a  counterpart  to  the  classical  statis¬ 
tical  estimation  issue  of  the  balance  between  bias  and  vari¬ 
ance  (e.g.,  Geman,  et  al.  [1992]).  We  propose  combining  ideas 
drawn  from  the  nonparametric  regression  approaches  of  projec¬ 
tion  pursuit  (PP)  (Huber  [1985]),  the  recent  idea  of  sliced  in¬ 
verse  regression  (SIR)  (Li  [1991]),  backfitting  (Hastie  and  Tib- 
shirani  [1990]),  and  the  design  of  scalar  smoothers  via  Gaussian 
kernel  smoothing  (Hastie  and  Tibshirani  [1990]),  to  decompose 
the  neural  network  design/training  process  into  one  involving 
gradient  methods  only  at  the  final  stages  where  we  need  only 
fit  scalar-valued  functions  of  scalar  inputs. 

FYom  PP  we  take  the  class  ot  regression  functions 

m 

S'  = 

i=i 

where  the  regressor  vector  (input)  is  x,  the  response  (output) 
variable  is  y,  and  we  need  to  select  the  ‘ridge  functions’  {ft} 
and  the  projection  directions  {]i;J  as  well  as  the  number  of 
terms  m.  This  model  can  be  embedded  in  a  neural  network 
where  we  may  use  several  nodes  to  approximate  each  of  the 
ridge  functions.  Our  goal  is  to  improve  the  efficiency  of  fit¬ 
ting  this  model  to  training  data  by  introducing  a  fairly  direct 
method  for  extracting  the  projection  directions.  We  need  then 
only  rely  upon  time-consuming  iterative  methods  to  approxi¬ 
mate  the  scalar-input  ridge  functions  by  scalar-input  network 
node  functions. 

Our  experience  with  backpropagation  (BP)  methods  ap¬ 
plied  to  forecasting  electric  load  time  series  (Yuan  and  Fine 
[1992])  suggests  that  often  the  resulting  neural  network  has 
several  sigmoidal  nodes  operating  about  the  origin  where  they 
are  essentially  linear;  e.g.,  for  the  standard  logistic  node  <7(z)  = 
[l-i-c“*]~*  this  could  be  the  region  [z[  <  1.  Hence,  we  select  ini¬ 
tially  gt  to  be  linear.  SIR  provides  a  direct  calculation  method 
for  calculating  the  weight  vector  tg  when  the  ‘true’  model  is 

y  =  +  t)  -t- «, 

for  noise  e  independent  of  input  x.  and  i  elliptically  symmetri¬ 
cally  distributed.  While  this  is  not  likely  to  be  our  situation, 
we  can  use  the  SIR  algorithm  to  provide  reasonable  projection 

'*  Prepared  with  psu^ial  support  from  NSF  Grant 
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directions  for  the  PP  regression  function  and  inqjrove  them 
through  a  process  of  iterative  backfitting.  Each  ridge  function 
ft  is  trained  to  model  the  residual  y'  resulting  from  the  dif¬ 
ference  of  the  actual  response  y  and  the  output  of  the  other 
m  - 1  ridge  functions.  The  number  m  is  selected  so  that  the  es¬ 
timated  residual  error  is  below  a  pre-assigned  threshold.  Given 
that  we  are  selecting  ft,  we  first  use  SIR  to  calculate  the  di¬ 
rection  liii  and,  keeping  the  direction  fixed,  we  use  Gaussian 
kernel  functions  to  smooth  the  resulting  scatter  plot  {(x.j,y^)} 
between  the  scalar  response  residual  j/}  to  the  jth  input  vec¬ 
tor.  At  the  conclusion  of  the  iterative  backfitting  process,  in 
which  we  cycle  repeatedly  through  the  m  terms  in  the  regres¬ 
sion  equation,  we  then  use  backpropagation  to  approximate  the 
smoothed  terms  in  the  PP  equation  by  small  neural  network 
subnets  that  now  have  scalar  inputs  and  outputs. 

Concerned  by  the  hypothesis  of  elliptically  symmetrically 
distributed  x  required  by  SIR,  we  have  developed  a  related 
method  that  uses  a  new  projection  index  motivated  by  the 
continuity  of  the  unknown  r^ression  function  g.  As  in  SIR,  we 
slice  the  output  values  into  ff  intervals  and  estimate  projection 
directions  based  on  the  groups  {/a,/i  =  1,^}  of  input  values 
that  share  output  vedues  in  the  same  interval.  For  example,  we 
find  a  single  projection  direction 

H 

ai  =  argmin,5,||5||=,j5^  ^  ^ (jc,  -  Xj)U,  -  S.jf  P- 

'•=' 

We  have  applied  these  nonpwurametric  r^ression-based 
training  processes  to  the  construction  of  a  neural  network  to 
forecast  daily  extremes  (daily  minimum,  evening  peak,  morn¬ 
ing  peak)  of  demand  for  electric  power,  using  data  supplied 
by  a  midwestern  utility,  and  will  compare  our  results  with 
those  we  have  achieved  previously  (Yuan  and  Fine  [1992])  using 
badcpropagation-based  methods. 
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Abstract 


A  new  notion  of  empirical  informational  divergence  between  two  indi¬ 
vidual  sequences  is  introduced.  If  the  two  sequences  are  independent 
realizations  of  two  stationary  Markov  processes,  the  empirical  rela¬ 
tive  entropy  converges  to  the  true  divergence  aimost  surely.  This  new 
empirical  divergence  is  based  on  a  version  of  the  Lempel-Ziv  data 
compression  algorithm. 

A  simple  universitl  classification  algoritlini  for  individual  sequences 
into  a  finite  number  of  clas.ses  whirl]  is  based  on  the  empirical  diver¬ 
gence,  is  introduced.  It  discriminates  between  the  classes  whenever 
they  are  distinguishable  by  some  finite-memory  classifier,  for  almost 
every  given  training  sets  and  almost  any  test  sequence  from  these 
classes.  It  is  universal  in  tlie  sense  of  being  independent  of  the  un¬ 
known  sources. 

Summary 

Suppose  one  observes  a  sequence  x  =  ( rj, . . .  .r„)  emitted  from  an  un¬ 
known  l-th  order  stationary  Markov  process  f)(  ■ )  over  a  finite-alphabet 
A  with  |A|=A  letters,  and  wishes  to  estimate  the  nth  order  entropy, 
or  equivalently  -n“Mogp(-).  While  the  straightforward  approach  of 
calculating  the  /-th  order  conditional  empirical  distribution  is  com¬ 
putationally  prohibitedly  complex  for  large  /  and  is  impossible  if  1  is 
unknown,  it  has  been  shown  in  [l].(2l  that  the  Lempel  -Ziv  (LZ)  code¬ 
word  length  for  x  divided  by  the  length  n,  is  a  computationally  effi¬ 
cient,  reliable  estimate  of  the  entropy,  and  hence  also  of -n“*  logp(-). 

More  precisely,  let  p(ii,rj . x„)  =  FI"*]  )  where  s,  « 

.  ..Xi)  for  i  >  f  and  where  s;  =  («o,  xi,i3,...,Xj)  for 
i  <  t.  So  being  the  initial  state;  Si  €  A.',  and  k'  is  the  set  of  all  length 
t  vectors  with  components  in  A. 

Let  c(x)  denote  the  number  of  phrases  in  x  resulting  from  the  incre¬ 
mental  parsing  of  x  [1],  i.e.,  sequential  parsing  of  x  into  distinct  phrases 
such  that  each  phrase  is  the  shortest  string  which  is  not  a  previously 
parsed  phrase.  Then,  the  LZ  codelength  for  x  can  be  approximated 
by  c(x)logr(x)  and  lim„_..x.  n”'  [-  log/dx)  -  r(x)logc(x)  ]  =  0  al¬ 
most  surely.  In  fact,  this  property  still  holds  as  long  as  p(-)  is  more 
generally  a  stationary  ergodic  process. 

Here  we  generalize  this  result  to  the  case  where  there  are  two  sta¬ 
tionary  fth  order  Markov  sources,  p(-)  and  </(■).  Let  x  and  z  be  real¬ 
izations  of  p(')  and  q{-),  respectively.  Given  x  and  z,  we  would  Uke  to 
estimate  reliably  -n~*logp(z)  and  similarly.  -n"Mogq(x).  In  par¬ 
ticular,  we  seek  an  easily  calculable  function  of  x  and  z,  independent 
of  /,  which  discriminates  between  two  unknown  Markov  sources  p(-) 
and  q(-).  To  this  end,  recall  that  the  divergence  D(q||p),  defined  as 

D(q||p)  =  iinisup  ^  •  (1) 

n— •x.  II  "  PtX) 

is  intuitively  interpreted  as  a  measure  of  distance  between  p(')  and  q{-) 
[2].  In  the  Markovian  rase  ronsiderded  here,  we  have 

i>(9llp)=  ^  .  (2) 

where  p(a|s)  and  qlal.s)  are  the  conditional  probabilities  of  a  letter 
o  6  a  given  a  state  s  €  a'  under  pf-)  and  q{-).  respectively.  Let 
9s(ni, • .  •  .nr+i )  =  q-la'"*'’)  be  the  relative  frequency  of  an  (t  -I-  1)- 
length  string  6  Then,  it  is  well  known  that 

(3) 


where 

Hz  =  -  51  H  9z(«.-')log9z(«l«)-  (4) 

Analogously  to  the  single  source  case,  where  — n~*  logq(z)  is  effi¬ 
ciently  estimated  by  n“’c(z)logc(z),  we  introduce  an  empirical  quan¬ 
tity  Q(z||x)  which  will  be  sliown  to  have  the  proi>erty. 

lim  [--logplz)  -  (,»(z||x)]  =  0 

almost  surely  w.r.t  the  product  measure  p  x  q,  for  every  finite  f.  Fol¬ 
lowing  (3),  the  function  Q(zl|x)  ran  be  decomposed  into  two  terms, 
the  first  of  which  is  an  estimate  of  the  empirical  entropy  associated 
with  z,  i.e.,  n"*c(z)logc(z).  and  the  second,  denoted  by  A(z|lx)  is 
an  estimate  of  the  divergence  between  qz{-)  and  p(-)  with  the  prop¬ 
erty,  lim„_«.  [A(zl|x)  -  D(q*||p)]  =  0  almost  surely  with  respect  to 
the  product  measure  p  x  q.  for  every  finite  C.  In  parallel  to  the  fact 
that  the  entropy  is  estimated  by  sel  f  LZ  incremental  parsing  of  z,  here 
intuition  suggests  that  A(z|lx).  which  is  an  estimate  of  the  cross  en¬ 
tropy  D{qz\\p),  will  be  associated  u'ith  cross  parsing  of  z  with  respect 
to  X. 

Specifically,  the  cross  parsing  procedure  of  z  w.r.t  x  works  as  fol¬ 
lows.  I'irst,  find  the  longest  prefix  of  z  that  appears  as  a  string  in  x, 
I.e.,  the  largest  integer  m  such  that  (sj,  zj, . . .  ,r„)  =  (Xi, , . .  .,x,+m_i; 
for  some  i.  The  string  (zi.zj, . .  ..Zm)  defined  as  the  first  phrase  of 
z  with  respect  to  x.  If  m  =  0  (i.e.,  zj  does  not  appear  in  x),  the  first 
phrase  of  z  with  respect  to  x  is  zj.  Thus,  the  case  m  =  0  is  treated  as 
though  m  =  1.  Next,  start  from  Zm.fi  a-nd  find,  in  a  similar  manner, 
the  longest  prefix  of  z^+i ,  Zm+j. . . .  ,Zn,  which  appears  in  x,  and  so  on. 
The  procedure  is  terminated  once  the  entire  vector  z  has  been  parsed 
with  respect  to  x.  Let  c(z|x)  denote  the  number  of  phrases  in  z  with 
respect  to  x. 

Intuitively,  A(zJ|x)  may  serve  as  a  reasonable  discrimination  func¬ 
tion  for  universal  classification  of  individual  sequences.  Indeed,  in  con¬ 
trast  to  the  probabilistic  framework  in  which  the  classification  problem 
IS  normally  posed,  we  show  that  a  classifier  based  on  the  comparison 
of  A(z||x)  to  a  threshold,  results  in  an  asymptotically  optimal  per¬ 
formance  for  almost  every  individual  tested  data  sequence  among  all 
finite-memory  classifiers  that  are  trained  by  given  training  sequences 
from  each  class,  have  a  rejection  option,  and  that  assign  to  each  class 
a  small  as  possible  set  of  vectors  so  as  to  make  the  sources  distin¬ 
guishable.  We  assume  that  any  competing  finite  memory  classifier  is 
consistent  in  the  sense  that  if  a  test  sequence,  to  be  classified,  appears 
in  the  training  set  it  will  be  classified  correctly.  It  should  be  pointed 
out  that  while  the  order  of  the  optimal  competing  finite  memory  clas¬ 
sifier  is  normally  unknown,  the  discrimination  procedure  based  on  the 
above  described  cross- parsing,  is  independent  of  (  and  computationally 
efficient. 
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Abstract 

Radial  Basis  Function  (RBF)  network  have  been  studied  intensively  ([1], 

[9],  [8],  [3],  [2], [12],  [13''  Besides  its  applications  several  theoretical  results 
have  been  obtsuned.  E.g.,  (1)  RBF  net  can  be  naturally  derived  from  regu/ar- 
ization  theory  ([9]),  (2)  RBF  net  has  universal  approximation  ability  [4],  (3) 
RBF  net  has  alM  best  approximation  ability  ([3], [5]). 

In  this  paper,  connections  between  RBF  network  and  Kernel  Repeeaion 
Estimator  (KRE)  are  built  up.  Recent  theoretical  results  about  KRE  are 
used  as  tools  to  obtain  the  theoretical  suits  on  RBF  net  in  several  aspects. 
First,  the  statistical  consistencies  of  RBF  nets  are  proved  in  various  situations, 
which  extend  the  current  results  on  the  approximation  ability  (e.g.  universal 
approximation, etc)  of  RBF  .  et  from  deterministic  case  to  more  practical 
stochastic  case.  Second,  the  convergence  rates  of  RBF  net  are  provided  in 
different  sitautions,  which  is  more  useful  than  merely  convergence  of  network 
approximation  in  the  case  of  infinite  number  of  hidden  units.  For  example, 
if  the  mapping  function  to  be  learned  is  bounded  and  of  o  orders  of  smooth, 
the  £,2  convengence  rate  of  RBF  net  made  of  basis  functions  with  a  compact 
support  is  up-bounded  by  0{n~  ),  which  also  means  that  for  a  given  error 

bound  cj,  the  number  of  hidden  units  is  about  the  order  of  O(|col”**^^  )■ 
This  gives  us  some  quantative  insights  on  the  designing  of  RBF  net.  Third, 
the  problem  of  selecting  the  appropriate  size  of  the  receptive  field  of  radial 
basis  function  is  investigated  and  how  the  selection  of  size  is  influenced  by  a 
number  of  factors  is  elaborated.  These  studies  are  new  in  the  literature  and 
quite  useful  for  the  further  theoretical  analysis  of  RBF  as  well  as  for  guiding 
the  design  of  RBF  net  in  practice. 

We  consider  the  normalized  version  of  RBF  net  ([8]) 


,  ,  Er=i  “'id([»  -  ei]'E~‘[»  -  gj]) 
'  "  Er=i«((*-Cil'S-‘[*-e,]) 


(1) 


where  ^(r’)  is  some  prespecified  basis  function  satisfying  some  mild  condition. 
The  most  common  one  is  Gaussian  function  ^(r’)  =  e~'  ,  but  a  number  of 
alternatives  can  also  be  used([9j).  Cj  is  called  center  vector  which  locates  ^(r*) 
centering  around  Cj.  u;i  €  R"  is  a  weight  vector  corresponding  to  the  center 

vector  Ci.  E  is  a  d  x  d  positive  matrix  which  is  usually  chosen  as  E  =  a^I 
with  <r  called  the  size  of  receptive  field  of  the  basis  function. 

For  a  given  fixed  ^(r’),  in  eq.(l)  there  are  three  sets  of  the  parameters: 

(1)  u),,  i  =  1,  ■  ■  ■  ,n,  which  are  the  weight  vectors  of  the  output  layer  of  a  RBF 
net,  (2)  the  center  vectors  c, ,  i  =  I ,  ■  ■  ■ ,  n  and  (3)  the  size  er.  The  last  two  sets 
constitute  the  weights  of  the  hidden  layer  of  a  RBF  net.  Theoreiically,  ail 
the  parameters  can  be  determined  based  on  a  given  sample  set  (Xj,Vi),i  = 
1 ,  ■  ■  • ,  A  by  minimizing  the  following  total  approximating  error 


W(y./n)  =  ;^£|y.-/n{A,)r 


(2) 


where  0  <  q  <  oo  and  the  usual  case  is  f  =  2. 

However,  the  minimization  with  respect  to  all  the  parameters  simulis^ 
neously  is  a  hard  problem.  It  is  usually  assumed  that  <7  and  Ct.i  ~  1,  n 
are  determined  from  the  samples  =  1,  ([10).  [8],  [3],  [2]).  In 

this  case,  the  minimization  of  /„)  can  be  simplified  considerably 

since  now  it  is  performed  with  respect  to  uf,,i  =  1,  .n.  A  special  case 

is  that  9  =  2  in  which  the  solution  is  given  by  IV  =  Y h^{K h^)~*  with 
W  =  K.  «£„).  Y  =  [Yi,  -.Ys]  and  K  = 

A  special  way  of  determining  =  I,  ■  n  ‘s  quite  simple:  a  subset 

{Xi.i  =  1.  n}  is  randomly  selected  among  {A'i.i  “  -  -  S]  and  every 
selected  sample  is  directly  used  as  a  center  vector,  i.e.,  c<  =  Xi^i  =  I, •  •  •,#• 


and 


Mr) 


E^li«([*-^il'E-‘[x-.V.]) 


(3) 


In  sequel,  we  call  the  RBF  nets  obtained  by  the  minimization  of  all  the  pa¬ 
rameters  the  idealistic  type  nets,  and  we  call  RBF  nets  given  by  eq.(3)  Type-I 
neU  Let  (X,  F).  (Xi ,  yi),  ,  (X„ ,  V,)  be  independent  identically  distributed 
R*  X  R'"-valued  random  vectors.  Let  R(x)  =  E{Y\X  =  x)  be  the  regres¬ 


sion  function  and  /i  be  the  probability  measure  of  X.  The  kernel  regression 
estimator  of  R(z)  is  defined  as  follows: 


Mx) 


(4) 


where,  h  is  smoothing  parameter  and  A  >  0  is  a  ^  integrable  kernel  on  R^. 
Estimator  eq.(4)  is  studied  in  [6, 7].  The  probabilistic  neural  network  proposed 
by  [11]  is  one  type  of  direct  extensions  of  Parzen  Window  estimator. 

Connections  between  RBF  net  and  KRE 
Let  A(r’)  =  ^(r*),  E  =  ir*/,  h?  =  <r’,  and  ui.  =  Vi, »  =  1,  ■  ■  • ,  n,  we 
see  that  eq.(4)  is  identical  to  eq.(3).  That  is,  a  spherically  symmetrical  kernel 
A(r^)  is  a  type  of  radial  basis  function,  the  smoothing  parameter  h  represents 
the  size  it  of  the  basis  function’s  receptive  field,  and  F)  acts  as  an  approximate 
solution  of  Wi .  Thus,  we  can  consider  the  KRE  eg.(4)  as  a  particular  case  of 
RBF  net  eq.(3).  Assumption  E  =  e^I  is  in  fact  commonly  used  in  the  existing 
studies  on  RBF  nets  ([1],[9],  [8],  [2]). 

Furthermore,  in  parallel  to  eq.(2)  we  denote  the  total  approximating  mor 
of  KRE  eq.(4)  by 

and  for  eq.(2),  we  let  e%BF.,vi ‘*®"°**  ‘'’®  obtained  by  minimii' 
ing  uij,Ci,f  =  l,  -,n  and  E  simultaneously,  i.e.,  CnBr-tri  “  minunal 
error  obtainable  by  a  RBF  net  of  the  idealistic  type.  Let 
denote  for  a  RBF  net  of  Type-I  defined  by  eq.(3). 

Lemma  Let  A(r*)  =  d(e’),  we  have;  S 

<  £e}cfly(y./n);  W£jiBr-.r.(y./-)  ^ 
e'KRBiYJo);  Ee^BBF-«Ti('^^fn)  <  EcUf-Tst.-I^' M  f 
under  the  same  receptive  held  specihed  by  £  =  h  /;  (C)  CRBF-soit'''"'  - 
Ee%gf.-or^Y,fn)  < 
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ABSTRACT 

The  finite  sample  performance  of  a  nearest  neighbor  classi¬ 
fier  is  analyzed  for  a  two-class  pattern  recognition  problem.  An 
exact  integral  expression  is  derived  for  the  m-sample  risk  Rm 
given  that  a  reference  m-sample  of  labeled  points,  drawn  inde¬ 
pendently  from  Euclidean  n-space  according  to  a  fixed  proba¬ 
bility  distribution,  is  available  to  the  classifier.  For  a  family  of 
smooth  distributions  characterized  by  asymptotic  exapansions 
in  general  form,  it  is  shown  that  the  m-sample  risk  Rm  has  a 
complete  asymptotic  series  expansion  Rm  ~  Ran 
(m  — *  oo)  where  R^o  denotes  the  nearest  neighbor  risk  in  the 
infinite-sample  limit.  Improvements  in  convergence  rate  are  shown 
under  stronger  smoothness  assumptions,  and  in  particular,  Rm  = 
Rao  +  0(Tn~^^”)  if  the  class-conditional  probability  densities  have 
uniformly  bounded  third  derivatives  on  their  probability  one  sup¬ 
port.  This  analysis  thus  provides  further  analytic  validation  of 
Bellman’s  curse  of  dimensionality.  Numerical  simulations  cor¬ 
roborating  the  formal  results  are  included,  and  extensions  of  the 
theory  discussed.  The  analysis  also  contains  a  novel  application 
of  Laplace’s  asymptotic  method  of  integration  to  a  multidimen¬ 
sional  integral  where  the  integrand  attains  its  maximum  on  a 
continuum  of  points. 
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Abstract 

We  attempt  to  discover  the  role  and  relative  value  of  labeled 
and  unlabeled  samples  in  reducing  the  probability  of  error  of 
the  classification  of  a  sample  based  on  the  previous  observation 
of  labeled  and  unlabeled  data.  We  assume  that  the  underlying 
densities  belong  to  a  regular  family  that  generates  identifiable 
mixtures. 

The  unlabeled  observations,  under  the  above  conditions,  carry 
information  about  the  statistical  model  and  therefore  can  be 
effectively  used  to  construct  a  decision  rule.  When  the  training 
set  contains  an  infinite  number  of  uidabeled  samples,  the  first 
labeled  observation  reduces  the  probability  of  error  to  within 
a  factor  of  two  of  the  Bayes  risk.  Moreover  subsequent  la¬ 
beled  samples  yield  exponential  convergence  of  the  probability 
of  classification  error  to  the  Bayes  risk.  We  argue  that  labeled 
samples  are  exponentially  more  valuable  than  unlabeled  sam¬ 
ples  and  identify  the  exponent  as  the  Bhatthacharyya  distance. 


Summary 


Assume  we  sample  from  two  populations,  0=1  and  0  =  2,  with 
prior  probabilities  ij  and  ^  =  1  -  Let  observations  from  population 
I  be  distributed  according  to  density  fi(x),  with  respect  to  some 
measure  p,  and  observations  from  population  2  according  to  /afz). 
We  observe  I  independent  samples  together  with  their  classifications, 
{(■'Yi,0i),",(A'/,0/)},  where  the  0,  are  Bernoulli! r;)  and  the  Xj  are 
i.i.d.  ~  /«,(z),  and  we  observe  u  unlabeled  samples  {Xp X„}.  The 
totality  constitutes  the  training  set. 

Let  X  be  a  sample,  similarly  drawn,  which  we  wish  to  classify  with 
minimum  probability  of  error.  Let  A(f,  u)  be  the  probability  of  error 
of  a  given  decision  rule  when  the  training  set  is  composed  of  f  labeled 
and  u  unlabeled  samples. 

If  /i(z),  /](z)  and  y  are  known,  the  likelihood  ratio  test 


Decide  0(  X ) 


1 

2 


if 

if 


(l-'j)/j(X)  - 
VfdX) 


variance  o’.  Then  any  mixture  of  the  form  i;(p(pi,ffj)  + 
can  be  uniquely  decomposed  into  its  component  densities. 

Thus  u  =  00  yields  the  information  that  the  underlying  distribu¬ 
tions  are  either  (>j/(z),  w(®))  or  (rig(x),yf(x)\  but  no  information 
is  available  on  whether  /,(;)  =  f(x)  or  f\(x)  =  g[z).  Thus  for  f  =  0 
labeled  samples, 

fl(0,  ^  for  all  “• 

Labeled  data  are  therefore  needed.  The  first  labeled  sample  helps 
enormously. 

Theorem.  When  the  training  set  contains  an  infinite  number  of 
unlabeled  samples,  the  first  labeled  observation  yields  a  probability  of 
error 

R(l,oo)  =  2R‘(l-  R-) 
for  the  classification  of  a  new  sample. 

The  expected  probability  of  classification  error  is  thus  reduced  to 
within  a  factor  two  of  the  Bayes  risk. 

Theorem.  When  the  number  of  unlabeled  samples  is  infinite,  the 
risk  converges  to  the  Bayes  risk  exponentially  fast,  i.e. 

A(l,oo)=  A*-hO(e-'“) 

where  o  =  -  log  (/ 2^  \J !i(x)h(x)  dp(i)) . 

Labeled  samples  can  reduce  the  risk  exponentially  fast,  but  unlabeled 
samples  reduce  the  risk  only  polynomially  fast.  Under  smoothness 
conditions  on  the  family  F,  similar  to  those  that  allow  efficient  esti¬ 
mation  of  parameters  [3],  there  exists  a  procedure  such  that 

fl(f.u)=  fl‘q.0(l/u)-l-0(e-'“). 

Roughly  speaking,  labeled  samples  are  exponentially  more  valuable 
than  unlabeled  samples. 
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minimizes  the  probability  of  error  (Bayes  risk)  which  is  equal  to 
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If  fill)  and  /j(x)  belong  to  a  regular  family  T,  the  distributions  and 
the  prior  probabilities  may  be  estimated  from  the  labeled  data.  If  an 
infinite  number  of  labeled  samples  is  available,  the  risk  is  given  by 
R(‘Xi.  u)  =  R"  for  any  number  u  of  unlabeled  samples. 

The  distributions  and  prior  probabilities  can  also  be  estimated  using 
the  unlabeled  observations,  under  the  additional  hypothesis  that  the 
family  of  mixtures  [I]  generated  by  T'  is  identifiable  (2).  For  exam¬ 
ple,  f  can  be  the  family  of  Gaussian  distributions  with  mean  p  and 
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ABSTRACT  We  address  the  problem  of  estimating  the  error  prob¬ 
ability  of  nonparametric  classification  rules.  Instead  of  the  well 
known  counting-type  estimators  we  propose  a  so  called  posterior 
probability  estimator,  which  plugs  a  nonparametric  estimate  of 
the  a  posteriori  probabilities  into  an  algebraic  expression  of  the 
error  probability.  We  explore  the  properties  of  the  plug-in  estima¬ 
tor.  Unlike  the  standard  estimators,  the  variance  of  our  estima¬ 
tor  is  shown  to  have  some  remarkable  distribution-free  properties 
for  the  it— nearest  neighbor,  kernel  and  histogram  rules.  We  pay 
special  attention  of  histogram  classification  rules,  and  show  the 
consistency  of  the  estimate  in  this  case.  Investigating  the  bias  of 
the  estimate  we  also  obtain  rate-of-convergence  results  under  mild 
conditions  on  the  distribution. 

I.  INTRODUCTION 


Let  the  random  variable  pair  (X,Y)  take  its  values  from  TV^  x 
(0,1).  It  is  well  known  that  the  decision  rule  g  :  Tl‘‘  — *  {0,1} 
which  minimizes  the  error  probability  Pt[g(X)  ^  V]  is  given  by 
the  Bayes  decision: 


9*(*) 


0 

1 


if  Po(x)  >  Pi(x) 
otherwise 


(x  €  R**),  where  the  P,{x)  =  Pr{y  =  i\X  =  x)  1  =  0,1  are  the  a 
posteriori  probabilities.  In  practice  the  P,(x)  are  not  known,  but 
a  training  sample  of  n  independent  identically  distributed  (i.i.d.) 
random  variable  pairs 


in=HX,,Ylh{X3,Y2} . (-Y„,K.)) 


is  given,  where  the  (-V,,  Y,)  have  the  same  distribution  as  that  of 
{X,Y),  and  (n  is  independent  from  (X,  T).  Most  nonparametric 
classification  rules  can  be  formulated  as 


The  formal  definition  of  the  estimator  that  we  investigate  is 
the  following:  Let  denote  the  training  sequence  with  (Xj,Yj) 
deleted  and  Pi''~^\x,(n,))  the  estimate  of  the  a  posteriori  prob¬ 
ability  Pi{x)  from  the  data  (n,j  Then  our  estimate  for  the  error 
probability  is 


(3) 


The  most  remarkable  property  of  the  estime  is  summarized  in  the 
following  result: 


Theorem  1  For  any  histogram,  kernel  and  k— nearest  neighbor 
classification  rule,  for  all  n  and  <  >  0 

-  ELiT’l  >  1}  <  2e-‘="'’ 


and 

regardless  of  the  distribution  of  {X,  Y),  where  the  constant  c  de¬ 
pends  on  the  dimension  only. 

The  proof  is  based  on  McDiarmid’s  extensioi.  of  Azuma’s  martin¬ 
gale  inequality  and  geometrical  considerations.  The  upper  theo¬ 
rem  suggests  that  using  the  proposed  estimate  could  be  favourable 
in  practice  as  compared  to  counting-type  estimates.  To  get  more 
insight  to  the  properties  we  also  investigated  the  bias  of  the  esti¬ 
mate.  Here  we  list  some  of  the  results  that  we  obtained  for  the 
case  of  the  cubic  histogram  classification  rule  with  cube-size  h  >  0. 
The  first  of  them  shows,  that  under  the  usual  conditions  on  h  the 
estimate  is  strongly  consistent  for  all  distributions. 

Theorem  2  For  the  cubic  histogram  rule  for  all  distributions 

-  0 

with  probability  one  whenever  h  —  0  and  nh^  — •  oo  os  n  — •  oo. 


gn{x)  = 


0  if 

I  otherwise. 


(1) 


where  Pj’'\x)  =  Pj"^(x,(n)  is  an  estimate  of  P,(x)  from  the  sam¬ 
ple  (n-  The  error  probability  of  the  rule  j„  is  given  by 


Ln  =  Pr(fl„(X)  Ji  =  1  -  F(P,.(X)(X)l(„),  (2) 

Examples  of  this  kind  of  classification  rules  are  histogram, 
1;— nearest  neighbor  and  kernel  rules. 

It  is  always  a  crucial  question  to  estimate  the  error  proba¬ 
bility  of  the  classification  rule.  The  most  standard  methods  are 
based  on  counting  the  number  of  errors  on  the  training  data.  Such 
estimates  are  the  resubstitution,  holdout  and  deleted  (leave-one- 
out)  estimates.  These  estimates  have  usually  small  bias,  but  their 
variance  can  be  undesirably  large.  Another  possible  strategy  is 
plugging  an  estimate  of  the  a  posteriori  probabilities  into  the  ex¬ 
pression  (2)  of  the  error  probability.  In  the  case  of  nonparametric 
rules  of  form  (I)  it  is  natural  to  use  the  estimates  that  define  the 
classification  rule. 


The  next  one  is  an  interesting  property. 

Theorem  3  For  the  histogram  rule  EL\t^  <  ELn,  that  is,  L\!’^ 
is  always  optimistically  biased. 

The  next  rate-of-convergence  results  provide  more  insight  into  the 
behavior  of  the  estimate: 

Theorem  4  If  the  distribution  of  X  is  of  compact  support,  then 
for  the  histogram  rule  there  exists  a  constant  C  such  that  for  all  n 

If  in  addition  the  support  of  the  distribution  is  convex  and  satisfies 
p(A)  >  mh*  for  all  measurable  A  for  some  m  >  0,  then 

for  some  C\.  If  Po(x)  is  uniformly  Lipschits,  that  is,  for  some  L 
|Po(x)  -  Po(y)\  <  L\\r  -  >11  for  any  x,  y  ,  then 

-  L„|  <  Lh. 
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Abstract:  A  Bayesian  approach  for  classification  of  Markov 
source  is  developed  and  studied.  Each  of  M  sources  is  described 
by  a  continuous-time,  discrete- state  Markov  chain  All  states  and 
times  of  transitions  between  states  can  he  observed  perfectly  but 
the  transition  rate  matrices  which  establish  the  parameters  of  the 
sources  are  not  known  a  priori.  A  Bayesian  training  algorithm 
using  a  fixed  amount  of  memory  digests  the  training  samples  that 
consist  of  a  methber  function  from  each  chain.  This  leads  to  an 
iterative  computationally  simple  classification  algorithm. 

Extended  Sumrttary 

The  general  Bayesian  classification  problem  can  be  described 
as  follows.  Let  €  0  be  the  parameter  set  of  the  s  —  th  source 
1  <  s  <  M,  where  0  is  the  parameter  space.  We  consider 
the  unknown  parameter  set  as  both  independent  of  each 

other  and  of  the  active  source,  and  identically  distributed  ran¬ 
dom  variables  each  governed  by  a  prior  probability  density  func- 
tion(PDF)  p{9,),s  =  1,2, . . . , Af.Let  y,  =  {y,uy,i,...,ym.),»  = 
1,2, ...,Af,  be  a  training  sequence  from  the  s  —  th  source.Let 
X  =  (zi,  Z], . . . ,  z„),  i  =  1, 2, . . . ,  n  be  a  test  sequence  to  be  classi- 
iied,produced  by  one  of  the  M  sources,  henceforth  called  the  active 
source.  The  index  of  the  active  source  is  unknown  and  considered 
to  be  a  discrete  random  variable  ui  taking  values  {1,2,. ..,Af}. 

The  classification  problem  is  that  of  identifying  the  active 
source  upon  observing  x  and  Y  =  {yi,y3,. . .  iXm}-  The  Bayes 
decision  rule  £(x|y)  which  is  optimal  in  the  sense  of  minimizing 
the  error  probability  of  classification  can  be  defined  as 

«(x|y)  =  s  if 
{p{Ht)p{x\yt,H,)} 

is  maximum  for  f  =  s  e  (1, 2, . . . ,  A/)  (1) 

Note  that  the  decision  rule  6{.)  is  a  partition  of  the  obser¬ 
vation  space  U"  of  all  possible  test  sequences  x  into  M  disjoint 
regions  A],  A], .  ■ . ,  Am  whose  union  equals  f/„.  Therefore  the  con¬ 
ditional  error  probability  p((e\Y)  associated  with  a  decision  rule 
6  s  ^(x|K)  is  defined  as 

P<(e|y)  =  P(x|yj,  Hi)  (2) 


where  Ri  is  the  complement  of  Rj,  j^Hf)  is  the  prior  probability 
of  the  i  —  th  source  and,  p{x|yi,  Hi)  is  the  conditional  probability 
density  of  x  given  both  the  training  sequence  yt  and  the  x  are 
generated  from  the  i  —  th  source. 

The  posterior  densities  p(x|y.,  H,)  in  (2)  are  difiScult  to  com¬ 
pute  in  general  and  depends  on  the  prior  densities  of  the  unknown 
parameters  which  are  usually  unavailable.  Recently,  Merhav  and 
Jiv  [1]  developed  a  suboptimal  Bayesiw  test  statistic  for  cl3is- 
sification  of  discrete-  time,discrete-state  Markov  sources  whose 
transition  probabilities  are  not  known  explicitly.  They  showed 
that  the  test  does  not  require  knowledge  of  prior  densities  and 
achieves,within  a  constant  factor,  the  minimum  error  probability 
in  Bayesian  sense. Unfortunately,  the  main  assumption  of  the  ap¬ 
proach  that  the  prior  density  p(.)  must  be  bounded,i.e.  for  all 

9ee, 

0  <  Pmin  <  (‘W  <  Pmmz  <  OO 

does  not  allow  to  apply  this  method  to  those  problems  whose  pa¬ 
rameter  set  have  infinite  support. 

The  study  reported  in  this  paper  defines  and  solves  a  classifica¬ 
tion  problem  based  on  a  Bayesian  approach  involving  information 
sources  each  described  by  a  continuous-time  Markov  chain, whose 
sample  functions  can  be  observed  directly  but  whose  parameters 
are  not  known  apriori.  As  opposed  to  the  approaches  presented  in 
[1],  the  posterior  densities  in  (2)  are  computed  exactly  by  choosing 
appropriate  prior  density  functions  for  9.  We  show  that  a  Naturel 
Conjugate  prior  density  exists  for  the  problem  investigated  here 
and  that  the  posterior  PDF’s  have  the  same  functional  forms  as 
the  prior  PDF’s.  Thus  the  classification  analysis  can  be  done  by 
operating  solely  on  the  parameters  of  the  prior  densities  updated 
by  the  training  sequences.  Naturel  conjugate  PDF’s  form  a  rich 
class  of  distributions,  giving  the  classifier  considerable  flexibility 
in  choosing  a  prior  density  for  9  by  setting  suitable  values  for 
the  prior  parameters.  The  decision  making  and  training  algo¬ 
rithm  derived  in  this  way  are  optimal  in  Bayesian  sense,  as  well 
as  computationally  simple,  recursive  and  require  a  fixed  amount 
of  computer  storage  regardless  of  features  in  x  and  Y. 

Rtfertncti 

(IjN.Merhav  and  J.Jiv,‘‘A  Bayesian  approach  for  classification  of 
Markov  sources,/££E  Tmns.lnfom.  Theory,  vdU7,no.4,July 
1991. 
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A  Computer  Algebra  Algorithm  for  the  Adjoint  Divisor 

D.  Polemi,  M.  Hassner,  O.  Moreno,  and  C.J.  Williamson 


Abstract.  Using  the  algorithm  in  [I]  for  desingularizing  a 
singular  plane  curve,  we  describe  a  polynomial  time  algorithm, 
which  can  be  used  for  computing  the  adjoint  divisor,  finding 
the  genus  or  adding  points  on  the  Jacobian  of  the  curve.  We 
also  do  a  complexity  analysis  of  the  mentioned  algorithm.  The 
algorithm  can  be  implemented  in  the  IBM  computer  algebra 
system  SCRATCHPAD. 

Summary.  We  denote  by  k  =  F,  a  finite  field  ofq  —  p‘  el¬ 
ements  where  p  is  a  prime  number,  and  by  k  the  algebraic 
closure  of  k.  Consider  a  projective  plane  singular  curve 
C  of  degree  d,  defined  over  k,  specified  by  a  polynomial 
G  \i[x,y]  which  is  irreducible  in  k[i,j/].  By  defini¬ 
tion  C  is  the  variety  C  =  {  (a,  <>)  6  k^  :  /(a,  6)  =  0}. 

The  affine  coordinate  ring  k[C]  of  C  is  the  quotient  ring 
V\x,y[/I  where  I  is  the  principal  ideal  generated  by  /.  We 
denote  the  field  of  rational  functions  of  C  by  k(C).  The 
next  is  an  outline  of  a  construction  for  the  adjoint  divisor. 
Step  A.  Construct  an  affine  non-singular  model  C 
(smooth  curve)  of  the  curve  C. 

For  this  step  we  use  the  algorithm  in  [1]  which  can  be  out¬ 
lined  as  follows: 

1.  Compute  an  integral  basis  {uo>  •lUn}  of  the  integral 
closure  k[C]  of  k[C]  considered  as  a  kfXj-module.  {k[X] 
is  the  ring  of  polynomials  of  one  variable.) 

2.  Introduce  new  variables  {X,  Fi, ...,  VI,}  which  corre¬ 

spond  to  the  basis  {uo,...,Wn}-  Then  the  non-singular 
model  is  C  =  {  a  G  k  /,^(a)  =  OVi.j},  where 

1  <  i,j  <  n,  fij  €  k[X,  Fi , ...,  y„]  are  polynomials  of 
the  form  fa  =  YiYj  -  _Ea=o ^  W- 
The  coordinate  ring  of  C  is  k[^  =  k[X,  Vi,  ...,y„]/(/,j). 
There  is  a  natural  map  x  ;  C  — *  C  inducing  the  map 
T*  ;  k(C)  — »  k(C)  on  the  field  of  rational  functions  of  the 
curves  [l](p.9). 

Step  B.  Compute  the  points  Ptj  €  5r“'(Ft)  lying  over 
the  singular  points  P*  =  (ot,/3*)  G  C. 

In  particular,  find  the  solutions  of  the  systems  of  equations 
A}(at,Yt,  ...,Y„)  =  0  where  f  =  =  l,...n. 

Step  C.  Compute  the  adjoint  divisor  of  the  curve  C. 

i.  Form  the  differential  w  =  x*  ( a/%y ) 
differentials  Q(C). 

ii.  Compute  the  local  unifonnizing  parameter  (l.u.p.) 
around  each  point  Pt,(  G  x“‘(Pt)  by  forming  the  matrix 

^  ^  ^  ®  some  i,  then  Yi 

is  the  l.u.p.  around  Ptj. 

iii.  Express  the  variables  X,  Y\  ,...y„  in  terms  of  the  l.u.p. 
Yi,  and  estimate  the  orders,  ordp^  ,X,  ordp,,  ,yj  j  =  l,...n. 
Furthermore,  compute  the  orders,  ordp,  ,w,  of  w  at  Pt,(. 

iv.  The  adjoint  divisor  of  the  curve  C  at  P*  is 
=  Et,/  -ordp»  ,w  Pkj,  of  degree  6*  =  Et,/  -ordp»  ,w. 


The  adjoint  divisor  of  C  is  A  =  degree 

6  =  Et  >  furthermore  6  =  2dim|^  k[C]/k[C]. 

Step  D.  Find  the  genus  of  the  curve  C. 

Using  the  computation  of  6  in  Step  C,  the  genus  of  C  is 
g  _  —  Ls.  The  time  complexity  of  the  above  al¬ 

gorithm  is  of  the  order  0(g^r^cf®log^9),  where  r  is  the  mul¬ 
tiplicity  of  C  at  its  worst  singular  point;  it  is  better  than 
other  algorithms  since  the  desingularization  technique  in 
Step  A  does  not  require  field  extensions. 

Example.Let  C  :  f(x,y)  =  +  xy  +  y*  =0  over  k  =  Fj 

with  an  ordinary  singular  point  Pi  =  [0, 0, 1]  of  multiplicity 
r  =  2,  and  k[C]  =  k[z, y]/(x^  +  xy  +  y*)-  An  integral  basis  of 
k[C]  is  and  C  =  {a  G  k^:  ftj(a)  =  0),  where 

/„  =  y*  -  yj,  /,2  =  YiYi-  xYs,  /i3  =  YiY3-(x  +  Yi), 
fz2  =  Yf  - 1*  -  xYi,  /m  =  Y2Yi-Y2-  xYi,  hs  =  Yi  -  y*  -  ya 
and  T~*(Pi)  =  {Pii  =  (0,0, 0,0),  P\2  =  (0,0,0, 1)}.  The  l.u.p. 
around  Pi,  and  P,2  is  y,.  We  rewrite  Yi  =  Yi,  Yi  =  Yi  -  Yi, 
and  X  =  yya  —  y,.  Thus  otdp,,y2  =  2,  ord/>,,y2  =  2, 
ordp,,y3  =  2  and  ordpjjTa  =  2,  ordp,,i  =  1  and  ordp,jX  =  1. 
Let  u;  =  dx/x,  then  ordp,,u>  =  —1  and  ordp,jU;  =  —1.  Hence 
A  =  Pii  +  P,2  and  g  =  1. 

Following  similar  methods  as  in  Huang  and  lerardi 
[3]($5),  adding  points  on  the  Jacobian  of  a  plane  singu¬ 
lar  curve  can  also  be  done  using  our  algorithm. 

We  can  generalize  the  algorithm  of  [1],  which  is  for  desin¬ 
gularizing  plane  curves,  to  the  case  of  an  arbitrary  curve, 
using  similar  methods  as  in  [4].  In  this  way  we  can  general¬ 
ize  our  algorithms  above.  Furthermore,  since  the  teqniques 
of  [4]  depend  heavily  on  the  theorem  of  the  primitive  ele¬ 
ment,  we  can  also  obtain  an  effective  method  to  generalize 
our  results  in  the  case  of  an  arbitrary  curve,  using  [2]. 
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1  Introduction 

In  this  contribution  we  give  an  algebraic  approach  for  coding  over 
two-dimensional  signal  space  for  QAM-like  constellations.  The 
two  main  points  are  an  isomorphic  mapping  of  fields  GF(p),  p  = 

1  mod  4  onto  a  subset  of  the  Gaussian  integers,  and  a  new  two- 
dimensional  modular  distance  called  Mannheim  distance. 

2  Mannheim  Distance,  and  Error  Cor¬ 
recting  Codes 

Gaussian  integers  are  those  complex  numbers  which  have  integers 
as  real  and  imaginary  parts  (for  Gaussian  integers  see  e.g.  (2), 
pp.182-187).  Primes  of  the  form  p  s  1  mod  4  can  be  written  in 
exactly  one  way  as  sum  of  two  squares.  Hence  such  primes  p 
are  the  product  of  two  conjugate  complex  Gaussian  integers;  p  = 
=  T-JT*  where  x  =  a-hi-6  and  *  denotes  complex  conjugation 
*■”  =  a  -  j  •  6.  Let  [.]  denote  rounding  to  the  closest  integer  and 
define  rounding  of  a  complex  number  by  [i  -(-  iy]  =  [z]  -|-  i[y].  Then 
the  modulo  function 

p(y)  =  g  mod  x  =  7  =  y  -  (|-^]  •  x 

maps  GF(p)  — ►  where  Qr  denotes  the  residue  class  of  the  Gaus¬ 

sian  integers  modulo  x.  In  figure  1  Gs+a  is  displayed. 

.  I  .  Figure  1  :  $3+, .2 


Clearly  the  inverse  mapping  then  immediately  follows  as 
=  9  =  7t»x'  +  7'ux  mod  p,  where  1  =  ux  +  vr". 

To  profit  algebraically  from  the  representation  of  GF(p)  as 
Gaussian  integers,  we  introduce  a  two-dimensional  modular  dis¬ 
tance  which  we  call  Mannheim  distance.  The  Mannheim  distance 
bet-een  two  Gaussian  integers  o  and  0  is  defined  as 
=  Re{7}  -f  Im{7},  where  7  =  (;9  -  o)  mod  x,  i.e.  the 
Mannheim-distance  is  the  well-known  Manhattan-distance  modulo 
a  two-dimensional  grid.  (The  streets/avenues  of  Manhattan  and 
Mannheim  form  a  rectangular  grid.)  In  a  straightforward  way  we 
define  the  Mannheim  weight  of  7  €  as  t»Af(7)  =  dAr(7.0),  and 


the  Mannheim  weight  of  a  vector  r  =  (ro,ri,...,r„_i)  over  Gx 
as  WAr(r)  =  Y^yiuiTj).  Similar  to  the  usual  Hamming  distance 
codes  we  characterize  linear  Mannheim  error  correcting  codes  by 
the  triple  [n,fc,dM]  where  n  is  the  length,  k  the  dimension,  and 

du  =  min{u)Af(c)|c  ^  0,c  €  C) 

the  minimum  Mannheim  distance  of  the  code.  We  start  with  the 
design  of  perfect  icyclic  One  Mannheim  Error  Correcting  (OMEC) 
codes  which  are  able  to  correct  errors  of  Mannheim  weigth  one. 
Then  Mannheim  error  correcting  codes  having  dsr  >  3  are  de¬ 
signed  and  decoders  working  in  a  similar  way  as  Berlekamp’s  ne- 
gacyclic  codes  for  the  Lee  distance  (see  [1],  pp.207-217).  The  codes 
are  90-degree  rotationally  invariant  and  are  very  easy  to  encode 
and  decode.  Synchronization  is  also  very  easy.  The  coding  gains 
which  can  be  achieved  sure  considerable,  in  particular  if  simple  code 
concatenations  are  considered.  For  primes  p  =  3  mod  4  we  can  use 
Gip  which  is  isomorphic  to  GF{f^). 

To  give  the  flavour  of  the  ideas,  consider  the  perfect  [n,  n  — 1,3] 
OMEC-codes  defined  by  the  parity  check  matrix 

. 

where  a  is  a  primitive  element  of  Gw-  Hence  6  {±*}i 
errors  of  Mannheim  weight  <  1  will  produce  different  syndromes. 
Decoding  is  straightforward;  Take  the  received  vector  r  =  c  -t-  e 
and  compute  the  syndrome  (a)  =  H  -  r^.  The  location  of  an  error 
having  touie)  =  1  is  then  given  by  f  =  log^s  mod  and  its 
value  by  s  ■a~'. 

Example  Let  p  =  13,  x  =  3  -I-  i  ■  2,  and  a  =  1  -1- 1,  then 
H  =  (l,l-hi,2i) 

Let  us  assume  that  at  the  receiviny  end  ue  get  the  vector  r  = 
(1  -f  «,», -1  +  1),  then  s  =  H  •  =  -2  =  o'*  and  we  find  that  at 

position  2=11  mod  3  uw  have  an  error  value  of  s-a~^  =  j  =>  e  = 
(0,0, i),  =>c  =  r-e  =  (l-f«,i,-l). 
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Let  be  a  finite  group  with  a  multiplicative  operation  and  identity 
element  e.  A  block  code  C  of  length  n  over  Q  is  any  non-empty  subset 
of  the  n-fold  direct  product  Q",  i.e.,  of  the  set  of  all  the  n-tuples  of 
group  elements.  We  assume  the  group  order  |ff|  to  be  finite.  The 
dimension  of  a  code  C  is  fc  =  log|;;|  |C|  symbols  per  block,  where  |C1  is 
the  code  size,  bounded  above  by  The  code  role  is  r  =  k/n.  The 
Hamming  distance  between  two  code  words  is  the  number  of  positions 
in  which  they  differ.  Let  X  denote  the  index  set  of  the  n-tuples  of  C. 
An  information  set  of  C  [1)  is  any  index  subset  J  Cl  of  size  \  J\  =  k 
such  that  every  fc-tuple  of  elements  of  G  occurs  in  ,7  precisely  once  as 
the  code  words  run  through  C.  Codes  exist  without  an  Information  set. 

The  direct  product  of  G  by  itself  n  times,  say  C",  forms  a  group. 
A  linear  block  code  over  ^  is  a  subset  of  G"  that  forms  a  group,  i.e., 
is  a  subgroup  of  G".  This  paper  is  devoted  to  the  description  of  such 
subgroups. 

In  algebraic  coding  theory,  the  “classical”  construction  of  linear 
codes  concatenates  a  Ir-tuple  of  information  symbols  with  n  -  k  check 
symbols  chosen  so  as  to  satisfy  certain  linear  parity  check  equations.  We 
show  that  this  construction  can  be  mimicked  to  generate  linear  codes 
over  groups. 

Definition  1.  .in  (n,k)  systematic  block  code  C  with  block  length  n 
and  dimension  k  over  a  group  G  is  a  subgroup  of  g"  with  order  Idl* 
formed  by  the  n-tuples 

(xi,  ij,  ...  ,  n,  yi,  ...  ,  y„_n)  (1) 

with  yi  =  ♦,(xi ,  xj,  ,  X*)  where  are  (n  -  fc)  maps  of  S*  into  G- 

For  linearity,  the  maps  must  be  homomorphisms; 

Proposition  1.  The  (n,fc)  systematic  code  with  code  words  (1)  is 
linear  if  and  only  if  the  maps  ♦,  are  homomorphisms  of  into  G- 

A  more  compact  definition  of  a  linear  code  can  now  be  provided. 
We  denote  by  X*"**  the  elements  of  O’". 

Definition  2.  A  linear  (n,fc)  systematic  block  code  C  over  a  group  G 
is  the  image  of  an  endomorphism  f  of  0"; 

«(x“>  I  =  (x<*)  I  [♦(X)]*"-*') 

where  t  is  a  homomorphism  of  0*  into  0"~*. 

The  following  proposition  shows  that  the  actual  algebraic  structure 
of  a  linear  code  does  not  carry  information  about  the  properties  of  the 
code  itself. 

Proposition  2.  All  linear  (n,fc)  systematic  codes  over  the  same  group 
G  are  isomorphic. 

We  are  especially  interested  in  linear  codes  that  cannot  be  obtained 
by  concatenating  shorter  codes.  A  group  G  is  called  indecomposable  [2, 
p.  121]  ifGjl  {e},  and  ifG  =  HxK  implies  either  H  =  (e)  or  K  =  {e}. 
Consequently,  we  define  linear  indecomposable  codes  as  follows. 

Definition  3.  A  linear  (n,k)  systematic  block  code  is  called  indecom¬ 
posable  if  it  is  an  indecomposable  group. 

The  code  words  of  a  decomposable  code  can  be  written  (possibly 
after  reordering  its  components)  as  the  concatenation  of  two  words, 

each  one  with  its  information  set  and  such  that  the  parity  check  symbols 
depend  only  on  one  word  or  the  other: 

(*> . **11  Vi.  •  •  •  ,  »/,  l**,+ii  1  X*,  *1.  ...  ,  rr, ) 

with  n  =  fc-i-fi  +fj,  and  each  y,  depends  only  on  X),. .  .i*,  while  each 
Xy  depends  only  on  i*,  .^i , . . .  x*.  If  this  is  the  case,  we  write  C  =  Ci  xCj. 


FVom  the  point  of  view  of  Hamming  distance,  linear  codes  over  non- 
abelian  groups  G  are  bad.  In  fact,  we  prove  that  under  certain  mild 
conditions  any  (n,  fc)  systematic  linear  code  over  G  is  decomposable 
into  fc  repetition  codes,  and  consequently  we  have  the  following  upper 
bound  to  its  minimum  Hamming  distance: 


Bound  (2)  improves  upon  a  previous  result  by  Forney  [1],  who  proved 
that  d  <  n  -  2fc  -(■  2  for  fc  <  n/2  and  u  =  1  tor  fc  >  n/2. 

By  examining  in  more  detail  the  construction  of  linear  codes  over 
abelian  groups,  we  prove  that  they  can  be  characterized  by  a  parity- 
check  matrix  H  that  describes  the  parity-check  symbols  y\,  -  ■  ■  ,yn-k 
in  (1)  by  expressing  them  as  powers  of  the  generators  of  G- 

The  simplest  case  is  that  of  a  cyclic  G- 

Theorena  1.  Consider  the  (n,  k)  linear  code  C  over  the  cyclic  group  Zf 
of  order  t.  The  parity  check  matrix  H  is  an  {n-  k)  x  n  echelon  matrix 
over  the  ring  Zt.  The  minimum  Hamming  distance  d  of  the  code  is 
equal  to  the  minimum  number  of  linearly  dependent  columns  ofH. 


Example  1.  Let  n  =  5  and  fc  =  2.  A  (5,2,3)  code  over  Zj  is  defined 
by  the  parity  check  matrix  over  Z2 


H  = 


1110  0 
0  10  10 
1  0  0  0  1 


It  is  easy  to  check  that  the  minimum  number  of  linearly  dependent 
columns  is  3.  Thus,  the  minimum  Hamming  distance  of  the  code  is  3. 
If  g  denotes  a  generator  of  Zj,  the  code  words  have  the  form 

In*’ s'’). 

A  similar  theorem  can  be  proved  for  general  abelian  groups. 


Theorem  2.  Consider  the  linear  systematic  (n,fc)  code  C  over  an 
abelian  group  A  of  exponent  d„.  The  parity-check  matrix  H  is  a 
m(n  -  fc)  X  mn  echelon  matrix  over  Zd„.  The  minimum  Hamming 
distance  d  of  the  code  is  given  by  the  minimum  number  of  linearly 
dependent  columns  in  matrix  H  over  the  ring  Zd„,  where  certain  sets 
of  columns  are  accounted  for  1. 


Example  2.  Let  n  =  5  and  fc  =  2.  A  (5,2,3)  linear  code  over  Zj  x  Z4 
exists.  The  code  is  defined  by  the  parity  check  matrix  over  Z^ 


r  1  0  1  0 
0  10  1 
13  10 
3  12  2 
112  0 
.0112 


-10  0 
0-10 
0  0-1 

0  0  0 

0  0  0 

0  0  0 


0  0  0  I 

0  0  0 

0  0  0 

-10  0 

0-1  0 

0  0  -1  . 


Direct  computation  shows  that  he  minimum  Hamming  distance  of  this 
code  is  3.  Its  code  words  have  the  form 

jri+3zi  +«  jii+Tj+Jij  j 
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Abstract 

The  cycle  join  algorithm  is  applied  to  construct  deBruijn  se¬ 
quences  from  irreducible  cyclic  codes.  The  number  of  sequences 
obtained  by  this  construction  is  shown  to  be  related  to  the  cyclo- 
tomic  numbers.  The  Matrix-tree  theorem  and  Gaussian  sums  give  a 
bound  on  the  number  of  sequences  constructed  in  this  way. 

Introduction 

A  binary  deBruijn  sequence  of  length  2”  is  a  sequence  such 
that  all  n-tuples  occur  exactly  once  in  the  sequence.  The  binary 
deBruijn  graph  of  order  n,  is  a  directed  graph  with  2”  nodes,  each 
labeled  with  a  unique  binary  n-dimensional  vector,  and  an  edge 
from  node  S  =  (so,si,...,3„..i)  to  node  T  =  (fo.fi, if 
and  only  if  (si,s2, =  (fo,ti,...,tn-2).  Any  nonsingu¬ 
lar  feedback  function  /(so.si,-  •  •  ,Sn-i)  =  ao  +  •  .Sn-i) 

decomposes  the  deBruijn  graph  into  disjoint  cycles  if  we  let 
the  successor  of  the  node  S  =  (so,ai,  -.an-i)  be  the  node 
(ai)a2.  ’)an-i)/(ao.si,-  ',an-i))-  Fro™  such  a  deccHnposition 
one  can  construct  deBniijn  sequences  by  the  well  known  cycle  join 
method,  which  consists  of  joining  all  the  cycles  stepwise  into  a 
deBruijn  sequence.  To  join  two  cycles  Ci  and  one  fin^  a 
node  5  =  (9oiSi.'  -,«n-i)  6  Cl  and  a  conjugate  node  S  =s 
(so  -1^1,  ai,  •  •  • )  Sn-i)  €  Ca.  After  interchanging  the  successors  of  S 
and  S  (which  corresponds  to  changing  one  value  of  the  function  g), 
the  two  cycles  will  be  joined  to  one  cycle.  Repeating  this  process 
will  eventually  lead  to  a  deBruijn  sequence. 

The  number  of  deBruijn  sequences  that  can  be  obtained  from 
this  construction  will  depend  on  the  number  of  conjugate  pairs  on 
all  pairs  of  cycles  generated  by  the  function  /,  which  is  the  starting 
point  of  our  construction. 

We  apply  the  cycle  join  method  to  construct  deBruijn  sequences 
from  irreducible  cyclic  codes,  i.e.  /  is  a  linear  tecunence  with  an 
irreducible  characteristic  polynomial.  We  give  an  algebraic  exf»es- 
sion  for  the  number  of  deBruijn  sequences  constructed  in  this  way 
in  terms  of  the  cyclotomic  numbers. 

Methods  and  results 

Let 

n  — 1 

/(so,  Sl,  ■  •  • ,  Sn  — l)  =  ^  ^  hiSi 
i=0 

n  — 1 

where  the  characteristic  polynomial  h(x)  =  x”  +  h.x'  is  itie- 

i>0 

ducible  of  degree  n  over  GF(2).  Then  /  will  generate  E  cycles  of 
length  e  in  the  deBniijn  graph,  where.  e£  =  2"  -  1  and  n  is  the 
smallest  integer  such  that  2"  =  1  (mod  c). 

Let  h(x)  be  the  minimum  polynomial  of  an  element  0  = 
for  some  primitive  (2"  -  l)-th  root  of  unity  a  in  GF(2").  We  can 
express  oP  by 

n-l 

a>  0<j<2"-l. 

laO 


We  define  the  mapping  <t>  :  CjF(2")  — ►  GF(2)"  by  ^(0)  = 
(0, 0,  •  ■  • ,  0)  and  ^(a')  =  (o/.o,  aj+E.o,  ■  ai+{n-i)E.o) ■  h  foBows 
that  ^  is  a  vector  space  isomorphism  and  thk  the  two  elements 
<l>(e),(l>(e+l)  6  GF(2’')  are  conjugated  for  all  e  GF(2"), 
^(1)  =  (1,0,  ...,0). 

For  any  eF  =  2"  —  1,  the  cyclotomic  classes  C,-,  0  <  i  <  F  in 
GF(2")  are  defined  as  follows; 

C,  =  {a'+>^|0  <j<  e}. 

The  cyclotomic  number  (i,j)E  is  defined  for  0  <  x,j  <  F  by 

{i,j)E  =  #{(C,Ol^  €  -I- 1  =  ^  €  Cj). 

The  cyclotomic  numbers  have  been  extensively  studied  in  the  liter¬ 
ature. 

The  crucial  observation  is  that  under  the  mapping  if>  above,  the 
cycles  in  the  irreducible  cyclic  code  generated  by  h(x)  cevrespond 
to  the  cyclotomic  classes.  Further,  the  number  of  ccmjugated  pairs 
between  the  cycles  equals  the  cyclotomic  numbers.  The  exact  number 
of  deBruijn  sequences  obtainable  in  this  way  can  now  be  found 
from  the  Matrix-tree  Theorem  on  the  graph  where  the  cycles  of  / 
are  nodes  and  the  edges  correspond  to  conjugated  pairs,  since  each 
tree  in  this  graph  represents  a  deBruijn  sequence.  Hence,  we  can 
obtain  an  algebraic  expression  for  the  number  of  deBruijn  sequences 
obtained  by -this  construction  in  terms  of  the  cyclotomic  numbers. 
Using  Gaussian  sums  to  aiqrroximate  the  cyclotomic  numbers  we  ate 
able  to  show  the  following  resuh. 

Theorem.  The  number  of  deBruijn  sequences  constructed  by  the 
cycle  join  method  starting  from  the  cycles  generated  by  an  irreducible 
polynomial  h(x)  is  at  least 

1  /2"  -  2F  -  (F  -  1)*2"/A 

F  )  ■ 
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Abstract 


0  0  ...  0  0 


An  m-seqiience  over  GFCq™)  can  be  expressed  as  a  vector  of 
m-sequences  whose  conqxMient  m-sequences  are  slufied  versions  cf 
an  m-sequence  over  GF(q).  The  amount  of  shift  between 
ctxiqxnents  of  the  m-sequence  over  GFCcf")  is  given  as  a  ratio  of 
elements  of  the  trace  dual  basis  correspon^g  to  die  basis 
expressing  GF((^)  over  GF(q).  An  ^cient  algoritfam,  which  does 
not  require  evaluation  of  the  trace,  is  developed  for  obtaining  the 
ratio  of  trace  dual  basis  elements  to  the  first  element  Thed^basis 
can  dien  be  cooopletely  obtained  by  evaluating  the  first  dual  basis 
element  Another  algorithm  is  developed  vriiich  effidendy  evaluates 
this  first  elonent  IiKluded  in  this  algorithm  is  a  sequential 
evaluation  of  die  trace  u^h  can  be  sequentially  obcuned  diiecdy  in 
terms  of  the  coefikients  of  the  primitive  polynmnial  that  generates 
GF(q“). 


Summary 


0  bj  bj  ...  b„.j  iv 

0  bj  b^  ...  b„  0 


(3) 


0  Vi  K  -  0  0 

0  b„  0  ...  0  0 


The  ratio  of  dual  basis  elements  can  be  ditained,  using  (3),  as 


(4) 


Let  {Xq,Xj,-,  X^i }  be  the  trace  dual  basis  of  the  basis 

{ l.Y,— ,7™**)  generated  by  the  primitive  polynomial  h(x) 
expressing  GF(q")  over  GF(q).  Also,  let  g(x)  be  the  degree  mn 
primitive  pdynomial  widi  root  a  that  generates  the  m-sequence  c 
over  GF(q)  and  f(x),  which  divides  g(x),  be  the  degree  n  primitive 
pdynomial  with  toot  a  that  generates  the  vectm  m-sequence  d  over 
GF(q“).  Then,  for  some  Cq,  d  can  be  expressed  as  [1] 


m-l  J 


C. 


(1) 


where  T*c  indicates  a  left  shift  of  c  by  i  elements,  7=0”, 

k. 

z=(q"”-l)/(q“-l),  gcd(w,q"'-l)=l,  and  7  The  shift  of  the 

jth  component  relative  to  the  0th  oonqionent  is  then  given  as 
zkjW  m^(q'™-l),  where  w  is  used  for  choosing  one  of  the 

4Kq™-iym  basis  or  primitive  polynomials  for  expressing  GF(q"') 
over  GF(q)  and  is  die  Euler  function. 

The  dual  basis  can  be  obtained  in  teims  of  B=A-i  where  [2] 


tr^(l)  tr^(7)  tt^(7*) 

irf(7)  tr^(Y*) 


tr^yn-l) 


tr^7^‘)  ... 


Representing  the  ctmqionents  of  the  m-sequence  generated  by  h(x) 
as  agBtr^COi^),  a  mot^ed  A,  A'  (exact  if  6=:1),  with  initial 
conditions  ag^l  and  aj^O,  i3l,2,...jn-l  can  be  evaluated  without 
calculating  the  trace.  Due  to  the  symmetry  and  triangular  nature  of 
A',  B,  the  modified  B,  reduces  to 


where 


b„,= 


(q-2Xq“-iy(q-i)  ^  jjj  ^  gygu 

( 

I-  [(q-3y2K<f”-iy(q-J)  J 

7*  ^  ^  ^  q  odd  and  m  even. 


(5) 


If  h(x)  is  a  trinomial  of  the  form  h(x)=x“+hjx+ho,  B'  is  a 
monomial  matrix  and  the  shift  factors  i=l,2,...,m-l,  are 

obtained  firom  7  “'^binY'  as 


(^)^+i.qoddand 


meven 


i=lA-...m-l.(6) 


The  determination  of  the  dual  basis  as  opposed  to  the  ratio  of 
dual  basis  elements  requires  the  evaluatkxi  of  2m-l  trace  functions 
as  shown  in  A.  An  algorithm  fm  the  evaluation  of  the  trace  ditectly 
in  terms  of  the  coefficients  of  the  primitive  polynonoial  h(x)  that 

generates  GF(q°')  and  traces  of  smaller  powers  of  a  is  also 
developed  here  as 


i-l 


- 1  -  K.V  i=l'2....ni. 


(7) 


As  usual,  tr^(l)=m  and  for  iSxo,  tr^CoO  can  be  expressed  as  a 

linear  combination  of  ti^(oc’),  j=0,l,....  m-l.  Thus,  can  be 
obtained  by  solving  bx  the  first  column  of  B  and  the  remaining  dual 
basis  elements  obtained  by  multiplying  (3)  by  or  the  entire  dual 
basis  can  be  obtained  solving  for  all  of  B. 
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Abstract:  ^^'e  present  a  theory  of  linear  recurrences  defined  on  convex 
lattices  in  the  2D  plane  and  propose  a  generalization  of  the  2D  BerlekampH 
Massey  algorithm  which  finds  a  minimal  set  of  linear  recurrences  capable  of 
generating  a  2D  array  on  a  2D  covex  lattice.  Furthermore  we  show  that 
this  algorithm  is  apphcable  to  decoding  efficiently  some  kinds  of  algebraic 
geometry  codes,  in  particular  codes  introduced  by  S.  Miura  and  N.  Kamiya. 


^  This  manuscript  is  a  revised  and  extended  version  of  the  paper  which  was 
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Linear  block  codes  over  rings  are  discnssed  for  q-aiy  PSK 
over  a  Qattssian  channel.  An  upper  bound  on  the  minimum 
Euclidean  distance  is  given  and  proved.  A  lower  bound  on 
the  block  error  probabili^  is  discussed. 

Introduction 

We  discuss  bounds  for  the  following  PSK-systam.‘ 


NoM«(t) 


the  modulation  is  q-aiy  PSK  with  enet^  per  dimension. 

The  chaimel  adds  vdute  Gaussian  noise.  The  detector  has  q 
congruent  regions  and  the  decoder  makes  error  correction. 
The  modulator,  the  channel  and  the  detector  together  form  a 
memorjdess  additive  qraiy  channel.  Thus,  y=x  +  z  (mod  q), 
vidtere  the  probabili^  of  an  error  pattern  s  is  a  function  of 
the  S^  aim  of  the  composition  oft. 

Definition  1.  Let  a.  m  and  n  be  positive  integers  and  let  H 
be  an  m  by  n  matrix  with  elements  in  Zq.  Then, 

C*{  z;  aH^>0(modq)} 

is  a  linear  block  code  over  Zq.  ♦ 

Definition  2.  Letebe  an  n-tuple  over  Zqand  letcj  (^be  the 
number  of  ^nnbols  in  y  which  are  equal  to  i  or  (q-  i).  Then, 

c(|i-(ci(y),C2(^,...,c^) 

isthecompositionofy.  For  odd  q,r»^.  For  even  q,r*|.  ♦ 


Definition  3.  Let  she  an  n-tuple  over  Zq.  Then, 


Definition  4.  Let  e  he  an  n-tuple  over  Zqand  let  I  be  the 

minimum  ofyj  and  (q-yj).  Then, 

n  r 

'»LW-5!iy|i*  smw 

j-1  i-1 

is  the  Lee  weight  of  r 


Theorem  1.  (Consider  a  linear  block  code  C  over  Zq,  of 
block  length  n  and  including  M  codewords.  Let  t  be  the 

t 

smallest  integer,  tin,  such  that  M  5!  ^  j  2*  >  q''. 

The  miniTnnm  Euclidean  distance  dgjQjQ,  betiween  two 
codewords  in  C  is  then  upper  bounded  by 

‘^Einln^2-yj^%TeinY  ♦ 

Proof  of  theorem  1.  We  first  remind  that  for  any  linear 
code,  the  difference  between  two  codewords  is  a  codeword. 
The  minimum  Euclidean  distance  between  two  codewords 
then  equals  the  minimum  Euclidean  weight  of  a  codeword. 
Consider  the  following  sidisetS  of  the  setof  n-tcqiles  over  Zq: 

S  =  {y; ci(^<t  andcj(^»Ofori>i}. 

t 

The  number  of  n-tuples  in  S  is  I S I  -  51  2^.  ^ 

assumption,  M  ISI>q'^.  Since  the  code  isastd)group in  Zq*^, 
there  mustbe  two  n-tuples,  yj  and  12  in  S  which  are  in  the 
same  cosetwith  respect  to  C.  Then,  x°*y|  -  yg 
in  C.  We  need  an  upper  bound  on  wgfi^.  We  have  cj(y\)  = 

=  cjlyj)  =  0  for  i  >  1,  which  implies  thatCjCiO  =  0  for  i  >  2. 
Furthermore,  w^Cyi)  ^  t  and  WLfjIg)  s  t  vdiich  implies  that 
w^(]^  s  2t  Thus,  cj(z)  +  2  C2(id  ^  2t  and  (^(id  ■■  0  for  i  >  2. 
Under  this  condition,  w^id  >*  maximum  for  c  j(id  *  0  snd 
C2(i0’‘t  The  upper  bound  is  then  ghfSQi^  definition  3.  ♦ 


Assume  fixed  q,  n,  and  SNR.  Then,  each  possible  error 
pattern  iq  has  a  fixed  probabili^  F(^).  Assume  a  linear  code 

with  M  equally  likely  codewords.  Then,  independent^  of 
which  codeword  is  transmitted,  the  code  can  correct  at  most 

one  error  pattern  in  each  coset  The  number  of  cosets  is 

We  define  V  to  be  the  set  of  the  ^  most  likely  error  patterns. 

The  best  possible  linear  code  then,  is  a  code  for  which  all 
error  patterns  in  Vbelong  to  distinct  cosets.  Thus, 


Pe-P(«,«»)sl-  SlPdi) 

^a  V 

(Tonsidsr  a  list  of  all  possible  error  patterns,  arranged  in 
order  of  decreasing  prwabili^.  The  list  is  t^  actual^  a  list 
of  composition  classes.  The  number  of  error  patiems  in  each 
class  can  easily  be  calculated  as  a  multinraaial  coefficient 
multiplied  ty  a  power  of  two.  The  bound  is  then  quite  ea^  to 
calcuMte  b^use  there  is  at  most  one  composition  class 
adtich  is  part^  inside  and  partly  outekfe  V. 
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Abstract 

Binary  formally  self-dual  (f.s.d)  even  codes  are  the 
one  type  of  divisible  [2n,n]  codes  which  need  not  be 
self-dual.  On  occasion  a  f.  s.  d.  even  [2n,n]  code 
can  have  a  larger  iTiinimum  distance  than  a  [2n,n] 
self-dual  code.  We  give  many  examples  of  interest¬ 
ing  f.  s.  d.  even  codes.  We  also  obtain  a  strength¬ 
ening  of  the  Assmus-Mattson  theorem.  If  C  is  a 
f.  s.  d.  extremal  code  of  length  n  =  2  (mod  8)[ti  =  6 
(mod  8)],  then  the  words  of  a  fixed  weight  in  CuC-^ 
hold  a  3-design  [1-design].  Finally,  we  show  that  the 
extremal  f.  s.  d.  codes  of  lengths  10  and  18  are  unique. 

Summary 

A  code  C  is  formally  self-dual  code  (f.s.d.)  if  C 
and  have  the  same  weight  distribution.  A  code 
is  divisible  if  the  number  of  vectors  of  any  weight 
is  divisible  by  a  constant  6,  greater  than  one.  The 
Gleason-Pierce  theorem  characterizes  the  fields  over 
which  a  formally  self-dual  divisible  code  can  exist  and 
shows  that  6  must  be  either  2,  3  or  4.  In  all  cases  but 
one  a  formally  self-dual  divisible  code  is  in  fact  self¬ 
dual.  We  consider  this  Ccise  in  this  paper,  which  is 
the  case  of  binary  f.s.d.  codes  with  8  =  2.  We  call 
such  codes  f.s.d.  even  codes  or  simply  type  I  codes. 

The  classification  of  self-dual  codes  with  vectors  of 
minimal  weight  2  is  trivial,  since  a  vector  of  weight 
2  in  such  a  code  is  a  direct  summand.  The  existence 
of  f.s.d.  even  codes  with  a  fixed  number  of  vectors  of 
weight  2  is  complicated.  We  prove  the  following; 

Theorem  1.  1.  //m,  are  positive  integers  such  that 

r 

^rrij  =  n(n  >  2),  then  there  exists  a  f.s.d.  even 
i=l 

r 

code  C  of  length  2n  with  A2=  (™')'  furthermore, 

1=1 

C  is  equivalent  to  its  dual. 

r 

2.  //  m,  =  n  —  r/2  where  r  is  even,  then  there 

i  =  l 

exists  a  f.s.d.  even  code  C  of  length  2n  with  A2  = 

r 

Again  C  is  equivalent  to  its  dual. 

1=1 
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Type  I  codes  which  are  not  self-dual  often  have 
a  larger  minimum  distance  for  a  given  length  than 
self-dual  codes.  In  fact  there  are  extremal  f.  s.  d. 
codes  where  self-dual  codes  cannot  exist.  We  con¬ 
struct  some  of  these  extremal  codes  and  list  the  open 
cases  where  the  existence  of  extremeil  f.  s.  d.  even 
codes  has  not  been  determined.  Also,  we  give  an  infi¬ 
nite  family  of  f.  s.  d.  even  codes  which  are  not  equiv¬ 
alent  to  their  duals,  using  the  Thryn  construction. 

It  is  well-known  that  one  can  obtain  designs  from 
vectors  of  a  fixed  weight  in  an  extremal  self-dual  code 
by  means  of  the  Assmus-Mattson  theorem  [1].  One 
can  extend  the  Assmus-Mattson  theorem  to  the  words 
of  a  fixed  weight  in  C  U  C"*’  as  follows; 

Theorem  2.  Let  C  be  a  [2n,n]  extremal  f.  s.  d.  even 
code  and  consider  the  set  S  of  vectors  of  a  fixed  weight 
in  C  and  in  C'^.  Then  the  set  S  holds  a  3-design 
whenever  2n  =  2  (mod  8)  and  a  1-design  whenever 
2n  =  6  (mod  8). 

Extremal  type  1  codes  exist  at  length  10  and  18. 
Delsarte  [2]  constructed  a  [10,5,4]  f.  s.  d.  code  and  ex¬ 
hibited  the  underlying  3-designs  by  means  of  inversive 
planes,  while  Assmus  and  Mattson  [1]  constructed  a 
[18,9,6]  f.  s.  d.  code  as  an  extended  quadratic  residue 
code.  They  exhibited  the  underlying  3-designs  as  a 
consequence  of  a  3-set  transitive  group  action. 

We  prove  the  following  about  these  codes; 

Theorem  3.  Any  two  f.s.d.  even  [10,5,4]  codes  are 
equivalent. 

Theorem  4.  Any  two  f.s.d.  even  [18,9,6]  codes  are 
equivalent. 
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Abstract 

Given  an  ordered  basis  of  and  an  integer  d,  we 
define  a  greedy  algorithm  for  constructing  a  code  of 
minimum  distance  at  least  d.  We  show  that  these 
greedy  codes  are  linear  tind  construct  a  parity  check 
matrix  for  them.  A  special  case  of  this  algorithm 
gives  the  lexicodes,  thereby  providing  a  procf  of  their 
linearity  which  is  independent  of  game  theory.  For 
ordered  baises  which  have  a  triangular  form  we  are 
able  to  give  a  lower  bound  on  the  dimension  of  greedy 
codes.  Some  greedy  codes  are  better  than  lexicodes. 

Summary 

Let  n  and  d  be  integers  with  0  <  d  <  n  and  sup¬ 
pose  that  the  set  Fj  of  binary  n-tuples  has  been  listed 
in  some  order.  Choosing  the  first  vector  on  the  list 
and  then  apply  recursively  the  rule: 

Choose  iht  nexi  vector  on  the  list  whose  (Ham¬ 
ming)  distance  to  each  previously  chosen  vector  is  at 
least  d. 

defines  a  binary  code  with  minimum  distance  at  least 
d.  Such  greedy  codes  were  discussed  in  [2,3]  in  the 
case  that  the  binary  n-tuples  are  listed  in  lexico¬ 
graphic  order. 

Let  B  denote  an  ordered  basis  j/i,  j/2,  •  •  • ,  J/n  of 
F2  .  The  ordered  basis  B  induces  an  order  of  the 
vectors  of  F^  defined  recursively  as  follows:  Let 
Vo  =  {(0,0, . . .  ,0)}  and  let 

Vi  =  (yi,...,yi)  (*  =  1,2 . n) 

be  the  subspace  of  F^  spanned  by  the  vectors 
{yi>  •  •  • ,  J/i}-  The  subspace  Vo  contains  a  unique  vec¬ 
tor  and  hence  its  vectors  are  ordered.  Suppose  the 
vectors  in  Vj-i  have  been  ordered 

Xi,X2,...,Xm  (m=2‘~’). 

We  have  the  partition 

Vi  =  v;_iu(y.eVi_i) 
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and  NSA  Grant  MDA904-89-H-2060. 
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and  we  order  the  vectors  in  V  by  following  the  vec¬ 
tors  Xi,X2,  ■ .  ■  ,Xm  with  the  vectors  y,  0  xj,  y,  0 

Xl,X2,  ...  ,Xm,yi  ®  Xi,yi  e  X2,  ...  ,yi  ®  Xm- 

Since  V„  =  Fj  ,  this  defines  an  order  for  the  vectors 
of  F"  which  we  call  the  order  induced  by  B  or,  for 
short,  the  B- order  of  ■ 

Let  B  be  an  ordered  basis  of  F^  and  let  d  be  an  in¬ 
teger  with  0  <  d  <  n.  Applying  the  greedy  algorithm 
(for  the  chosen  d)  to  the  F-order  of  F^  we  obtain  a 
code  C  =  C(B,  [)  whose  minimum  distance  is  at  least 
d.  The  code  C  is  the  B- greedy  code  of  length  n  and 
designed  distance  d.  The  lexicodes  are  a  special  case 
of  B-greedy  codes. 

Our  main  result  is  that  B-greedy  codes  are  always 
linear  and  we  show  how  to  enhance  the  greedy  algo¬ 
rithm  in  order  to  determine  a  parity  check  matrix  of 
the  code.  We  also  show  that  it  suffices  to  consider 
only  B- greedy  codes  of  even  designed  distance.  The 
B-greedy  codes  for  which  B  is  a  triangular  ordered 
basis  are  called  triangular- greedy  codes. 

We  present  computer  data  which  shows  that  these 
codes  have  dimension  within  one  of  the  best  codes 
known  [4]. 
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Abstract 

The  notion  of  the  r-cover-free  families  was  introduced  by 
Kautz  and  Singleton  in  1964  [17].  They  initiated  investigat¬ 
ing  binary  codes  with  the  property  that  the  disjunction  of  any 
<  r  (r  >  2)  codewords  are  distinct  {U Dr  codes).  This  led  them 
to  studying  the  binary  codes  with  the  property  that  none  of  the 
codewords  is  covered  by  the  disjunction  of  <  r  others  {Super¬ 
imposed  codes,  ZFDr  codes;  P.  Erdos,  P.  Frankl  and  Z.  Furedi 
called  the  correspondig  set  system  r-cover-free  in  [7]). 

Since  that  many  results  have  been  proved  about  the  max¬ 
imum  size  of  these  codes.  Various  authors  studied  these  prob¬ 
lems  basically  from  three  different  points  of  view,  and  these 
three  lines  of  investigations  were  almost  independent  of  each 
other.  This  is  why  many  results  were  found  first  in  information 
theory  ([1],  [4],  [5],  [14],  [15],  [16],  [17]),  were  later  rediscovered 
in  combinatorics  ([2],  [6],  [7],  [10]),  or  in  group  testing  ([12], 

[13]),  and  vice  versa. 

We  shall  approach  this  area  from  the  combinatorial  side. 
Our  mmn  goal  is  to  estimate  the  maximal  size  of  the  family 
of  subsets  of  an  n-element  set  with  the  property  that  no  set  is 
covered  by  the  union  of  r  others. 

Summary 

Let  S  be  an  n-element  set.  2^  is  the  set  of  all  subsets  of  S.  (f) 
denotes  the  set  of  all  fc-subsets  of  S  {k  >  0).  If  |5]  =  n,  then 
Kt)l  “  (*)•  denote  by  [n]  tho  set  {1,2, . . . ,  n},  and  log  x  is 
always  of  base  2.  A  set  system  A  C  2^  is  called  ik-uniform  if  its 
members  are  k-  sets.  It  is  usually  supposed  that  the  underlying 
set  of  the  set  systems  is  [n]. 

We  call  C  2^  r-distinct,  if  U*=i  #  Uj=i  ^r 
any  {Ai,  Aj, . . . ,  At}  /  {fl,,!?,, . . . ,  ,  1  <  it,f  <  r; 

Ai,A2,...,Ak,B,,B2,...,B(  €  r.  C  2^  is  r-cover-free,  if 
Ao  2  Ai  U  A2  U  . . .  U  Ar  holds  for  all  distinct  Ao,  A] , . . . ,  .4r  6 
T.  .F*  C  2^  is  <  r  part  intersecting,  if  |A,  n  A^j  < 
imin{|A,|,|A;|}  for  any  distinct  Ai,Aj  €  7'  holds.  We  de¬ 
note  by  T'(r,n),  T(r,n),  T*{r,n)  and  T'{r,n,k),  T{r,n,k), 

T*{r,n,k)  the  maximum  cardinality  of  the  corresponding  set 
systems  in  general  and  in  fc-uniform  case,  resp.  We  will  pro¬ 
vide  upper  bounds  on  these  functions  for  r  fixed  and  n  tending 
to  infinity. 

The  following  vipper  and  lower  bovmds  were  proved  in  [1], 

[4],  [5],  [7],  [13]:  there  exist  two  (absolute)  constants  ci,  C2  such 
that 

£1  <  <  £2 

r*  “  n  ~  r 

for  any  n.  In  most  papers  the  lower  bound  is  proved  by  prob¬ 
abilistic  methods.  In  [13]  V.T.  Sos  and  F.K.  Hwang  used  a 
greedy-type  algorithm  to  generate  <  r  part  intersecting  fami¬ 
lies  for  proving  the  lower  bound.  The  upper  bound  was  proved 

using  the  observation  that,  by  definition,  53  ('  )  <  2".  The 

1=1  ‘ 

gap  between  the  upper  and  lower  bounds  is  rather  large.  Dy- 
achkov  and  Rykov  obtained  a  better  upper  bound  [4): 


logT(r,n)  logr 
— r ^  C3— ^ 


(2) 


for  some  absolute  constant  C3  and  any  n.  Their  proof  is  rather 
involved.  Here  we  shall  give  a  simple  and  purely  combinatorial 
proof  of  this  result. 
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1  Introduction 

Let  C,,  i  =  1,2, .  . .  denote  an  infinite  family  of  binary 
codes  with  length  n;,  covering  radius  i  j,  minimum  dis¬ 
tance  di.  Assume  that  the  limit  p  (resp.  5)  of  the  ratio  ^ 
(resp.  for  large  t  exist  and  call  it  normalized  covering 
radius  (resp.  distance).  Our  aim  is  to  study  the  set  Ij 
(resp.  Vj’")  of  points  {p.S)  of  the  unit  square  achieved 
by  binary  families  of  codes  (resp.  of  linear  codes).  We 
address  the  following  questions  for  both  domains 

1.  bounds  on  the  extreme  points 

2.  convexity 

3.  continuity  at  the  border. 

Both  sets  split  naturally  into  four  subdomains  according 
to  the  position  of  p  and  S  w.r.t.  J. 

2  Convexity 

Question  2  is  still  unsolved.  A  weaker  result  is  the  fol¬ 
lowing 

Theorem  I  Let  (x,y)  €  Vj''"  (nsp.  €  Y})-  Thm  every 
point  of  the  line  between  (e,  y)  and  (1.0)  ties  m  )'J'"  (resp. 

Ya) 

3  Upper  Left  Corner 

(0.5  <4<1,  0<p<  0.5))  The  Plotkin  bound  shows 
that  this  corner  is  empty. 


The  MR2W  (bound  of  four)  bound  yields  a  more  pre¬ 
cise  but  less  explicit  result.  Nonconstructive  results 

(1]  of  Cohen  and  FrankI  entail  that  the  intersection  of 
the  Brst  axes’  bissector  with  this  corner  lies  entirely  in 
W*”.  Theorem  1  shows  then  that  the  whole  triangle 
{(0,0).(0.5.0.5).(1,0))  is  in  Manin  [3]  showed  the 
existence  of  a  continuous  function  03(6)  which  is  the 
“true  upper  bound”  on  the  rate.  Analogously  define 
^  the  “true  lower  bound”  on  p.  We  do  not  know 
if  02  continuous  in  this  corner.  All  that  is  known  is 
^  >  i?2(5)>  M-H\-a2(6)). 

5  Right  Half-Square 

This  means  p  >  0.5. 

5.1  Linear  Codes 

A  simple  construction  shows  that  the  line  segment  {p  = 
0.5,0  <  4  <  5)  lies  entirely  in  Y^" .  By  Theorem  1  the 
triangle  spanned  by  this  segment  and  the  point  (1.0)  lies 
entirely  in  Yj'"  Now  recall  the  Janwa's  bound  [2] 


Families  of  linear  codes  with  ti  =  1  (for  i  large  enough) 
lie  on  p  =  ;  -  I  Families  cf  linear  codes  with  i-,  >  2  (for 
i  large  enough)  lie  under  p  =  1  —  which  is  a  side  of 
the  preceding  triangle.  This  settle  the  three  questions  for 
this  corner  in  the  linear  case. 

5.2  Unrestricted  Codes 

It  is  easy  to  see  that  p  <  1  -  |  is  valid  for  all  (fam¬ 
ilies  of  )  codes  with  at  least  two  words.  The  Plotkin 
bound  yields  4  <  |  for  families  of  codes  with  at  least 
three  words.  A  careful  study  of  3- word  codes  based  on 
the  Sloane- Mattson  [5,  4]  linear  programming  methodol¬ 
ogy  shows  that  there  are  such  codes  on  the  line  p  =  4.  So 
Fi  consists  ofY}"',  plus  the  triangle  with  sides  the  bis¬ 
sector  and  the  first  two  Janwa  bounds.  The  status  of  the 
triangle  (1/2, 2/3)(4/7,4/3)(2/3, 2/3)  is  still  unresolved. 
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4  Lower  Left  Corner 

(0  <  4  <  0.5,  0  <  p  <  0.5))  Let  0(4)  denote  any  bound 
on  the  rate  as  a  function  of  the  distance.  Eliminating  the 
rate  by  the  sphere  covering  bound  yields 

p>  //-'(l-0(4)), 

where  ff  is  the  entropy  function.  Taking  0  to  be  the 
Elias  Bound  yields  the  following  weak  but  elegant  result 

4<2p(l-p). 
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Figure  I :  Linear  Codes 


Figure  2:  Uttrestricted  Codes 
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1  Introduction 


Paoor  Suppose  code  C***  has  a  codeword  of  weight  u>  #  0.  We  can  put  the 
parity-check  matrix  H  for  code  C  into  the  following  form: 


A  code  C  C  F*  has  covering  mdiu*  (at  most)  r,  if  d(z,  C)  <  r  for  all  z  6  F^. 
For  linear  codes,  the  covering  radios  is  the  highest  weight  of  any  cosetleader 
of  the  code.  A  basic  question  concerning  the  covering  radius  of  codes  is  to 
determine  /ir(n,r),  the  minimum  cardinality  of  any  block-code  of  length  n 
and  covering  radius  r.  Pot  linear  codes  this  question  amounts  to  determining 
l(m,  r),  the  minimum  length  of  a  linear  code  srith  codimension  m  and  covering 
radius  r.  We  show  how  techniques  from  coding  theory  can  be  successfully 
applied  to  improve  bounds  on  l(m,  r)  found  in  the  literature  [3-7]. 

2  Preliminaries 


1.  AJl  •yndtomei  of  the  fotm  (*,  1)  ihould  be  the  snm  of  one  or  two  colnmni 
of  mntrix  B,  hence  w.  +  ui  (n  -  w)  =  w  (n  +  1  -  ur)  >  J— -• 

2.  The  colnnuu  of  mntrix  Ai  1-cover  niing  mntrix  i4o.  Now  nppii- 

cntion  of  Lcmmn  2  proven  the  ntntement. 

3.  Vin  elementnry  row  opcrntionn  on  mntrix  H  we  obtnin  n  tero-coinmn  in 

mntrix  Now  we  cnn  npplj  the  ‘invemion  property’  (Lemmn  3).  □ 

Reriurk  6  Notice  thnt  Property  1  is  wenfcer  thnn  Property  2,  since  Prop¬ 
erty  2  together  with  the  sphere-covering  bound  implies  Property  1.  Often 
significnntly  better  bonnds  for  X'(n,  1)  nre  known.  □ 


Let  C  be  n  code  of  length  n  with  covering  rndins  r. 

A  trivial  lowerbound  for  the  siie  of  n  covering  code  is  given  by  the  Sphere- 
Covering  Bound 

1^1  E  (i)  ^  2"  (1) 

»s0  '  ' 

The  Van  Wee  bound  [1]  improves  on  this  bound,  whenever  (r  1)  |  (n-r  1): 


As  a  direct  consequence  we  obtnin 

If  n  is  even,  then  K(n,  1)  >  2*/n  (3) 

Blokhuis  and  Lain  [2]  showed  that  arbitrary  coverings  and  sphere-coverings 
can  be  linked. 

Definition  1  Let  5  C  Fj  and  let  A  be  a  k  x  n  matrix.  5  it  said  to  r-covet 
f  *  using  matrix  A,  if  {s  +  w  A'  |  s  €  5  and  u)i(w)  <  e  }  =  JF’^.  □ 

Lemma  2  If  5  r-covers  F*  using  matrix  A,  then  the  set 
C  ;=  {w  e  FJ  I  w  a’’  €  S  }  C  Fj  has  covering  radius  r.  In  particular, 
K(n,r)  <  |S|2— *.  ~  □ 

Linear  codes  can  be  slightly  modified  without  changing  the  covering  radios,  as 
is  demonstrated  by  the  following  trivial  ’inversion  property.'  For  codes  with 
even  coveting  radius  this  property  was  already  mentioned  in  [4]. 

Lemtrta  9  The  linear  codes  with  parity  check  matrices  H  = 

tesp.  H'  =  (  g  I  )  I  the  »““«  eoveting  radios.  O 

Lemma  4  If  C  is  an  [n,  k,d]-one  weight  code  without  sero-positions,  then 
d(2*  -  1)  =  n2*"'.  In  particnlar  2*"‘  |  d.  O 


S  New  Bounds  for  Linear  Covering  Codes 

A  linear  covering  code  imposes  restrictions  on  the  fotm  of  its  dual  code.  This 
observation  enables  ns  to  transform  the  problem  of  designing  a  ‘good*  linear 
coveting  code  into  the  problem  of  designing  a  (dual)  linear  code  with  a  lot  of 
structure  imposed  onto  it.  Techniques  bom  coding  theory  might  show  that 
such  a  dual  code  can  not  exist.  We  demonstrate  the  main  idea  for  coveting 
radius  two. 


As  an  application  of  Lemma  5  we  prove  the  bound  ((2m  -  1, 2)  >  2**  -I- 1  for 
m  >  3,  improving  by  one  the  minimum  value  of  {(2m  -  1,2)  imjdied  by  the 
Van  Wee  bound.  This  bound  was  conjectured  by  Bmaldi,  Plesa  and  Wilson 

[3],  but  up  to  now  only  the  case  m  =  6  has  been  settled  [4,5].  The  proof  is 
surprisingly  simple. 

Example  /(2m  — l,2)>2'*-flforallm>3 

Proop  Suppose  C  is  an  [n  =  2’*,  2’*  -  (2m  -  l)]-code  with  coveting  radius 
two.  We  infer  bom  Property  1  of  Lemma  5  that,  for  m  >  3,  code  C'*'  does 
not  contain  the  all-one  vector.  If  a  codeword  of  weight  w  /  0  occurs  in  C'^, 
then  we  have  K{v,  1)  <  ui  with  v  -Ho  =  n,  according  to  Property 

two  of  Lemma  5. 

For  even  v  we  have  the  lowerbound  /f(v,  1)  >  2*/v,  cf.  equation  (3).  Thus 
we  obtain  the  inequality  v  ui  >  In’  for  even  v.  Since  v  -f-  ui  =  n  we  have  in 
fact  equality  and  w  =  n/2  =  2’*'*.  We  infer  that  the  even  weight  subcode 
of  of  dimension  k  >  2m  -  2  is  in  fact  a  one-weight  code  with  d  =  2’*~‘, 
hence  it  satisfies  the  divisibility  constraint  2*~‘|d  (cf.  Lemma  4).  However, 
for  m  >  3  this  divisibility  constraint  is  not  satisfied. 

Hence  1(9, 2)  >  33, 1(11, 2)  >  65, 1(13, 2)  >  129,  etc.  O 

Remark  T  In  a  similar  way  we  can  prove  the  next  bonnds:  1(16,2)  >  363, 
1(18,2)  >  725, 1(20,2)  >  1449, 1(22,2)  >  2897.  ~  Q 

Our  approach  can  be  extended  into  several  directions,  enabling  ns  to  prove 
the  bonnds  1(6, 2)  =  13,  1(7, 2)  =  19,  1(8, 2)  >  25,  1(9, 2)  >  34,  1(8, 3)  =  14, 
1(9,3)  >  17, 1(10,3)  >  21, 1(12,3)  >  31,  1(13,3)  >  38. 
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Lemma  6  Let  C  be  an  [n,  n  -  m]-code  with  covering  radius  two.  Then  the 
weights  u)  ^  0  in  O’-  satisfy  the  following  properties: 

1.  ur  (n  -t-  1  -  w)  >  2“"* 

2.  „  2<*-”>-"*-‘'  >  Jr(n-u>,l) 

3.  if  weight  tv  can  not  occur,  then  weight  (n  -f  1)  -  w  can  not  occur  cither 
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Abetract.  We  study  partitions  of  the  S|»ce  of  all  the  binary  n-tuples 
into  disjoint  sets,  such  that  each  set  is  an  additive  coset  of  a  given  set  V.  Such 
a  partition  is  called  a  perfect  tiling  of  ij"  and  denoted  (P,  A),  where  A  is  the  set 
of  coset  representatives.  A  sufficient  condition  for  a  set  P  to  be  a  tile  is  given  in 
terms  of  the  cardinality  of  P +P.  A  perfect  tiling  {P,  A)  is  said  to  be  proper  if  P 
generates  F^.  We  show  that  the  classification  of  perfect  tilings  can  be  reduced 
to  the  study  of  proper  perfect  tilings.  We  then  prove  that  each  proper  perfect 
tiling  is  uniquely  associated  with  a  perfect  binary  code.  A  construction  of  proper 
perfect  tilings  from  perfect  binary  codes  is  presented.  Furthermore,  we  introduce 
a  class  of  perfect  tilings  obtain^  by  iterating  a  simple  recursive  construction. 
Finally,  we  generalize  the  well-known  Lloyd  theorem,  originally  stated  for  tilings 
by  spheres,  for  the  case  of  arbitrary  perfect  tilings. 

Given  a  body  in  the  n-dimensional  Euclidean  space,  is  it  possible  to  tile 
the  space  with  exact  copies  of  this  body?  This  problem  has  been  extensively 
studied  in  the  classical  literature,  see  [6]  and  references  therein.  We  study 
here  the  binary  version  of  this  problem.  Let  denote  the  n-dimensional 
Bamming  space,  i.e.  the  set  of  all  binary  n-tuples  with  addition  term  by  term 
modulo  2.  A  given  set  (body)  V  tiles  F^  if  it  is  possible  to  perfectly  cover  F^ 
with  disjoint  additive  cosets  of  V.  Note  that  the  set  of  coset  representatives  A 
is  also  a  tile  of  F^.  Without  loss  of  generality  we  assume  that  both  V  and  A 
contain  the  0  element.  Evidently,  each  element  x  of  Fjl'  must  have  a  unique 
representation  of  the  form  x  =  v  +  a,  where  v€V  and  a€A.  Thus  we  have 
the  following  de&nition  of  a  perfect  tiling:  (V,  A)  is  a  perfect  tiling  of  F^  if 
V-i.A  =  Ej"  and  (V+V)  n  (A+A)  =  {0}. 

If  both  V  and  A  are  groups,  (V,  A)  is  a  perfect  tiling  of  iff  A  =  F^/V. 
Hence  in  the  sequel  we  consider  only  nonlinear  tilings  where  at  least  one  of 
the  sets  V,  A  is  not  a  group.  A  well-known  example  of  a  nonlinear  tile  is  a 
sphere,  in  which  case  the  set  of  coset  representatives  is  a  perfect  binary  code. 
Tilings  by  generalized  spheres  have  been  studied  in  [2]  and  [3].  The  following 
proposition  shows  that  many  more  perfect  tilings  exist. 

PropositioD  1.  ff  IV-hVI  <  21^1  there  exists  a  group  A,  such  that  (V,  A)  is 
a  perfect  tiling. 

In  particular,  since  |V1  <  IV  +  VI  <  (^j^)  -i- 1,  any  set  of  cardinality  4  is  a 
tile  by  Proposition  1.  If  1V  + V|  is  large  it  is  sometimes  possible  to  show  that 
no  tiling  is  possible.  Certain  bounds  on  the  cardinality  of  V  -t-  V  for  a  given 
set  V  may  be  found  in  (7). 

We  shall  say  that  (V,  A)  is  a  proper  perfect  tiling  of  F^  if  (V,  A)  is  a  perfect 
tiling  of  F^  and  V  generates  fj",  i.e.  {V)  =  Fj,  where  (V)  denotes  the  span 
of  V.  We  now  prove  that  the  classihcation  of  perfect  tilings  can  be  reduced 
to  the  study  of  proper  perfect  tilings. 

Proposition  2.  A  set  V  is  a  tile  of  Fj  iff  it  is  a  tile  of  (V).  Fbrthermore  all 
the  sets  A,  such  that  (V,  A)  perfectly  tiles  F^,  can  be  constructed  as  follows. 
Denote  m  =  2"^'  - 1,  where  r  is  the  rank  of  V . 

1.  Let  Ao,  A] , . . .  Am  be  some  m-fl,  not  necessarily  distinct,  subsets  of  {V) 
such  that  for  all  i,  0  <  i<m,  (V,  A,)  is  a  proper  perfect  tiling  of  {V). 

2.  Let  Co  =  0,  Cl , ...  Cm  be  a  set  of  representatives  of  F^/{V). 

3.  For  1  <  i<Tn,  let  u,  be  any  element  of  (V). 

Then  A  =  Ao  U  (ui  ci  -k  Ai)  u  . . .  U  (t/m  +  Cm  +  Am). 

The  foregoing  proposition  shows  that  all  the  perfect  tilings  may  be  con¬ 
structed  from  proper  perfect  tilings.  Therefore,  we  shall  henceforth  assume 
that  n  =  rank(K),  and  identify  F^"  with  (V).  With  an  appropriate  choice  of 
basis  for  F^,  it  may  be  further  assumed  that  V 3S„(Q,  1),  where  B„(Q,  1)  is 
a  Hamming  sphere  of  radius  I  in  Fp.  Some  of  the  facts  which  we  were  able 
to  demonstrate  for  proper  perfect  tilings  are  listed  below. 

1.  If  |V-h  V|  =  211^1,  V  ’I  a  tile  iff  (V-k  V)  is  not  a  group. 

2.  Obviously,  if  n  =  IKI  -  1  then  V  =  B„{Q,  i)  is  a  tile,  and  A  is  a  perfect 
binary  code.  If  n  =  ITI  -  2  then  V  is  also  a  tile,  and  A  is  a  shorteucd 
Hamming  code. 

3.  If  I V|  <  8  and  (V,  A)  is  a  proper  perfect  tiling,  then  A  is  a  group. 

Let  (T,  A)  be  a  perfect  tiling,  and  let  7/(1^)  bean  nx(lV|-l)  matrix  having 
the  elements  of  T\{0}  as  its  columns.  For  x  €  set  s(i)  =  7f(V)i* 

and  de6ne  the  codes  C  and  Co  as  follows: 

C  =  {c  e  E, :  s(c)  e  A}  Co  =  {c  €  :  s(c)  =  0} 
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Propositions.  The  code  C  is  a  perfect  binary  code  with  minimum  dis¬ 
tance  3,  and  Co  is  a  linear  subcode  of  C.  Furthermore,  if  (V,  A)  is  a  proper 
perfect  tiling  then  |C|  =  (Co|  •  (A|,  rank(C)  =  ITI  rank(A)  -  rank(V)  -  1, 
and  C  is  linear  iff  A  is  a  group. 

Proposition  3  shows  that  each  perfect  tiling  is  uniquely  associated  with  a  per¬ 
fect  binary  code,  and  provides  a  means  for  constructing  perfect  codes  from 
perfect  tilings.  The  converse  construction  is  also  possible. 

Proposition  4  (Converse  of  Proposition  3).  Let  C  be  a  perfect  binary  code 
of  length  n  with  minimum  distance  3.  Let  F  be  a  linear  code  of  dimension  7, 
such  that  C  +  r  =  C.  Let  ff(r)  be  a  parity  check  matrix  of  F.  If  P\{0}  is 
the  set  of  rows  of  H{TY  and  A  =  {H{T)c^  :  c  €  C},  then  (V,  A)  is  a  proper 
perfect  tiling  of  Fp~''. 

Note  that  7  is  possibly  0,  in  which  case  P  is  a  sphere  and  A  =  C.  If  7  ,7  0, 
many  non-equivalent  perfect  tilings  may  be  constructed  from  the  same  perfect 
binary  code.  In  this  case  the  construction  of  Proposition  4  is  not  explicit,  as 
there  is  no  obvious  way  to  6nd  the  code  F.  Several  explicit  constructions  of 
proper  perfect  tilings  from  perfect  binary  codes  will  be  presented  elsewhere. 

The  correspondence  between  sets  V,  A  such  that  V-hA  =  Fp  and  cov¬ 
erings  by  spheres  of  radius  1  has  been  initially  noticed  in  [1].  The  relevance 
of  their  rank,  however,  seems  to  have  been  overlooked.  We  will  hereafter 
elaborate  on  this  issue.  First  we  show  that  many  proper  perfect  tilings  have 
a  recursive  structure  analogous  to  the  structure  of  perfect  tilings  exhibited 
in  Proposition  2.  Suppose  that  (P,  A)  is  a  proper  perfect  tiling  of  Fp,  with 
rank(A)  <  rank(P)  =  n.  Then  (A,P)  is  a  perfect  tiling  of  .^.  Applying 
Proposition  2  to  (A,  P)  yields 

P  =  Po  U  (Ol  -I-  Cl  Pi )  U  .  .  .  U  (om  -k  Cm  Pm), 

where  for »  =  0, 1,. . .  m,  (A,Pi)  is  a  proper  perfect  tiling  of  (A),  co,ci, . .  .Cm 
are  the  representatives  of  Fp  1(A),  and  a,  6  (A).  The  same  argument  can  now 
be  applied  to  each  of  the  tilings  (A,Pi),  provided  that  rank(Pi)  <  rank(A) 
for  all  >  =  0, 1, . . .  m.  This  defrnes  a  class  of  tilings  obtained  by  recursively 
iterating  the  construction  of  Proposition  2.  The  recursion  terminates  only  if 
a  proper  perfect  tiling  with  rank(A)  =:  rank(V)  is  encountered.  Such  a  tiling 
is  said  to  be  of  full  rank.  Full-rank  perfect  tilings  have  been  constructed 
by  Etzionand  Vardy  in  |4J.  In  view  of  the  foregoing  discussion  they  may  be 
considered  as  the  “building  blocks”  of  all  the  perfect  tilings. 

For  demonstrating  the  non-existence  of  certain  tilings  the  following  gener¬ 
alization  of  the  Lloyd  theorem  may  be  useful.  Let  x»(V)  = 
be  a  character  of  the  group  algebra  QF„  (cf.  [5],  chap.  5).  For  a  perfect  tiling 
(P,  A),  defrne  the  sets  V,N(U),  A'  and  IV(A')  as  follows: 

U  =  {a:  x.(P)  =  0}  N{U)  =  {>  :  3u  e  f/  with  wt(u)  =  j}, 

A'  =  {o'  :  Xa’(A)  #  0}  N(A')  =  {j  :  3a’  with  wt(o')  =  j}. 

Note  that  the  set  A'  may  be  regarded  as  the  code  formally  dual  to  A. 
Propoeition 5.  In  the  above  notation  N(A')  C  N{U)  and  |l/|  >  |P|. 
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We  prove  that  the  code  of  the  title  is  Steiner  systems).  Acknowledgement .  The 

unique  by  showing  it  can  be  extended  to  author  is  grateful  for  support  from  ENST, 

a  constant-weight-4  code  of  type  (10,30,4).  Paris,  and  INRIA,  Rocquencourt ,  where  this 

The  uniqueness  of  the  latter  code  was  work  was  done, 

proved  by  Witt  in  1938  (in  the  language  of 
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SUMMARY 

Two-user  data  communications  is  considered  in  which 
the  users  each  transmit  Pulse-Amplitude  Modulated  data  signals 
through  linear,  time-invariant  channels  with  transfer  functions 
H\{f)  and  respectively.  The  received  signal  is  the  sum  of 

die  ou^iuts  of  these  channels  plus  white  Gaussian  noise.  Assuming 
that  the  symbol  rate  is  the  same  for  each  user,  and  that  the  users  are 
not  allowed  to  coordinate  their  transmissions  on  a  per  symbol  basis, 
we  study  the  problem  of  optimizing  their  transmitted  pulse  shapes. 

Two  types  of  receivers  are  considered:  the  matched 
filter  detector,  which  attempts  to  demodulate  each  user 
independently,  while  treating  interference  from  the  other  user  as 
wide-sense  stationary  noise,  and  the  Minimum  Mean  Squared  Error 
(MMSE)  linear  detector,  which  joindy  demodulates  both  users 
simultaneously.  For  each  case  necessary  conditions  are  derived  for 
the  transmitted  pulse  shapes  that  minimize  the  Mean  Squared  Error 
(MSE),  subject  to  an  average  power  constraint,  and  conditions  are 
given  for  which  the  corresponding  solution  is  unique.  Our  results 
generalize  those  in  [1],  in  which  the  MMSE  linear  transmitter  and 
receiver  ftlters  for  a  single-user  channel  are  derived. 

User  i,  I  e  ( 1,  2|,  generates  a  sequence  of  pulses 

sAt)  =  (1) 

k 

where  )  is  the  sequence  of  transmitted  data  symbols  Grom  user 
I,  and  1/T  is  the  symbol  rate,  which  is  assumed  to  be  the  same  for 
both  users.  The  received  signal  is  then 

y(t)  =  hi*Pi*S!(0  +  *2*Pi«2(0  +  «(0  (2) 

where  p  i  (r)  and  P2(0  etc  the  pulse  shapes  for  each  user,  h  i(r)  and 
h2{t)  are  the  impulse  response  functions  associated  with  // 1(/)  and 
M2(f),  respectively,  denotes  convolution,  and  n(t)  is  white 
Gaussian  noise  with  spectral  density  al- 

For  the  matched  filter  detector  the  output  of  Hi(f)  is  the 
input  to  the  filter  with  transfer  function  where  /’,(/)  is 

the  Fourier  Transform  of  pAO-  The  output  of  this  filter  is  sampled 
at^rate  1/7,  which  produces  the  estinuted  sequence  of  symbols 
The  MSE  for  the  matched  filter  receiver  is 

Tall,  /  UIIQill^-l)^+lQlQ2l^+^IQ,il^-4r  (3) 

-iliJT) 

where  al  =  ^  =  alfal,  and 

lQi(/)J*  =  Pi\f-{k-l-Kyr]H,lf-ik-l-K)m  (4) 

i  =  k^l . 2A+1.  The  three  terms  in  the  integrand  can  be 

classified  as  MSE  due  to  ISl,  multiple-access  interference,  and 
noise. 

The  MMSE  liirear  detector  in  general  offers  a 
significant  perfotmanGe  improvement  relative  to  the  matched  filter 
detector.  The  MMSE  linear  detector  for  this  multi-user  channel 
consists  of  matched  fibers  followed  by  symbol-rate  samplets,  and  a 


2-input/2-output  digital  filter.  The  MSE  in  this  case  is  given  by  [2] 

1/(2D  2E  +  IIQill^+IIQ2lP 

MMSE  =  Tal  f  - - - - 7-^ — ; - r#(5) 

-Jar,  (^  +  tKii  11^X4 +  llQ2ll^)-  IQ;Q2I^ 

We  now  wish  to  find  transmitter  pulses,  qrecified  by 
Pi(/)  arxl  P2(f)y  to  minimize  the  expressions  for  MSE  given  by  (3) 
and  (S)  subject  to  the  average  power  constraints 

mr,  g 

T  j  I  IFi(f-*/r)l^  S  Hj,  1  =  1,2.  (6) 

-i/(2D  [t=-jr 


To  specify  the  solution  to  these  (^)timization  problems 
we  need  tte  following  notation.  For  every  /e  [-1/(27),  1/(271] 
define  _  ki(f)  to  be  any  integer  for  which 
\Hi(f-kilT)\  ^  \Hi{f-klT)\  for  all  imegers  k.  Provided  that 
1  =  1,2,  satisfy  some  relatively  weak  conditkms  (i.e., 
caimot  be  a  constant  times  l//2(/)l  on  a  set  of  positive 
measure),  then  the  MMSE  transmitter  filter  for  the  MMSE  linear 
detector  is  given  by 


IPi(/-ki/T)l^  = 


)//,(/ -vr)  I  ^ ; 


IHi(/-k,/T)l 


/e  G<(/),  where  1/  I  < 


Gn(f)  =  I/:  \Hi{f-kilT)\  >  I  (8a) 


Gi2{f)  = 


l//,(/^/7)l^  ^  K 
\Hjlf-kj/T)\'^  h 


(8b) 


where  I  *  j.  For  f  d  Gjip|G,-2  and  for  k^kj,  \Pi{f-klT)\  =0. 
The  constants  X.)  and  X2  are  selected  to  satisfy  the  constraint  (6). 
We  show  that  the  solution  to  (7)-(8)  is  unique,  subject  to  appropriate 
restrictions  on  Hjif). 

Note  that  where  IP,(/)I  dO,  it  has  the  same  form  as 
the  MMSE  transmitter  filter  for  the  single  user  channel  with  transfer 
fimction  //,(/).  The  MMSE  tnnismitter  filters  for  the  nutched  fiJttr 
receiver  also  has  this  pn^rty.  Since  meas(G  120(^22)  =  ()> 
preceding  results  imply  that  for  the  MMSE  receiver  and  die  type  of 
multiple-access  channel  considered.  Frequency  Division  Mult^e 
Access  (FDMA)  is  optimal.  This  is  also  true  for  the  matched  filter 
detector.  Specific  examples  of  Wi(/)  and  //2(/)  along  with 
optimized  pulse  shapes  and  an  associated  comparison  of  MSE  are 
planned  for  presentation  at  the  conference. 
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A  discrete- time  symbol-synchronous  code-division 
multiple-access  (S-CDMA)  system  is  considered 
where  K  independent  users  spread  their  real-valued 
encoded  symbols  5*,  A:  =  1, . . . ,  A",  by  individual  real 
signature  sequences  s*  of  length  L  ‘chips’  and,  then, 
transmit  the  X-dimensional  symbols  BkSk  over  a 
Gaussian  multiple-access  (GMAC,  [1])  channel  with 
noise  correlation  matrix  (T  denotes 

transposition  and  II  is  the  L  X  L  identity  matrix). 
Moreover,  the  same  symbol-energy  constraint  is  as¬ 
sumed  for  all  users,  i.e.,  E[Bl]  <  £c  and  sjsjt  =  L. 
For  convenience,  the  I>-dimensional  observation  vec¬ 
tor  at  the  output  of  the  GMAC  channel  is  written 
as 

y  =  55-t-iV  (1) 

where  the  sequence  matrix  S  —  [si , . . . ,  5)^ ]  and  the 
symbol  vector  B  =  [Bi,. . . ,  Bk]^-  Finally,  an  opti¬ 
mum  multiuser  receiver  is  supposed  which  makes  a 
joint  maximum-likelihood  decision  of  all  information 
data. 

The  goal  of  this  presentation  is  to  find  those  se¬ 
quence  multisets,  consisting  of  K  not  necessarily  dif¬ 
ferent  sequences,  which  enable  the  K  users  to  com¬ 
municate  reliably  and  fairly  with  maximum  sum  rate. 
As  a  consequence,  equation  (1)  is  viewed  as  a  S- 
CDMA  channel  having  a  capacity  region  C(5)  which 
is  a  function  of  the  sequence  matrix.  The  criterion  of 
goodness  for  a  sequence  multiset  is  chosen  to  be  the 
largeness  of  the  symmetric  capacity  C,ym{S)  per  chip 
where  is  defined  by  the  maximum  achiev¬ 

able  equal-rate  point  in  the  capacity  region  C(S).  It 
is  achieved  with  zero-mean  Gaussian  distributed  en¬ 
coded  symbols  of  maximum  allowed  variance  £c  [2]. 


Theorem  1:  Let  the  real  sequence  matrix  5  = 
[si , . . . ,  sk-]  consists  of  K  L-dimensional  sequences 
sit  of  equal  energy  L.  Then, 

Csym{S)  <  ^  log  ^1  +  A [bits/chip] 

with  equality  if  and  only  if  the  L  rows  of  5  are  or¬ 
thogonal  and  have  equal  norm  A,  i.e.  SS^  —  KIi. 

This  upper  bound  is  equal  to  the  sum  capacity  of 
a  GMAC  with  noise  variance  and  A  chip-inputs 
of  equal  energy  Sc  [1,  p.378].  Therefore,  it  can  be 
concluded  that  (chip-)dimensions  can  be  used  most 
efficiently  in  a  fair  communication  as  long  as  55^  = 
KIl  in  S-CDMA.  Note  that  a  necessary  condition 
for  SS^  =  KIl  IS  K  >  L.  Moreover,  SS^  =  KIl  is 
the  necessary  and  sufficient  condition  for  a  sequence 
multiset  to  meet  Welch’s  lower  bound  on  the  sum  of 
the  squares  of  the  inner  products  between  all  pairs 
of  A  equal-energy  sequences  [3]. 

In  the  presentation.  Theorem  1  will  also  be  gener¬ 
alized  for  the  case  of  two-dimensional  modulation. 
Additionally,  further  properties  of  Welch-bound- 
equality  sequence  multisets  will  be  mentioned. 
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Abstract 

The  determinant  of  the  correlation  matrix  between  n  time-limited 
unit-energy  signals  can  be  seen  as  a  measure  of  orthogonality  of  the 
rignal  set.  The  problem  of  designing  a  signal  set  that  maximizes  this 
determinant  is  considered  under  the  average  as  well  as  the  maximum 
root  mean  square  (RMS)  bandwidth  constraints. 

Summsiry 

In  multiuser  communication,  an  important  signal  design  problem 
is  to  choose  a  set  of  unit-energy  signature  signals  that  are  optimally 
orthogonal.  This  corresponds  to  an  autocorrelation  matrix  that  is  as 
close  as  possible  to  the  identity  matrix.  If  there  is  no  constraint  on 
the  bandwidth,  it  it  possible  to  choose  orthogonal  signab.  However  a 
nontrivial  constraint  on  the  bandwidth  necessitates  that  the  signals 
be  non-orthogonal.  The  optimality  measure  on  the  correlation  ma¬ 
trix  is  defined  in  this  paper  to  be  its  determinant.  This  measure  it 
chosen  due  in  part  to  its  significance  in  the  PAM  synchronous  Gaus¬ 
sian  CDMA  channel  where  the  capacity  re^on  was  characterized  in 
the  high  signal-to-nmse  ratio  regions  via  the  total  asymptotic  effi¬ 
ciency  which  in  turn  was  shown  to  be  upper  bounded  (achievably) 
by  the  determinant  of  the  correlation  matrix  (1].  The  bandwidth  of 
each  time-limited  signal  is  defined  to  be  its  root  mean  square  (RMS) 
bandwidth  (cf.  [2]). 

In  this  paper  we  consider  the  problem  of  finding  the  optimally 
orthogonal  unit-energy  signab  whose  maximum  RMS  bandwidth  is 
bounded  by  Bq.  A  solution  to  thb  problem  is  found  by  solving  the 
problem  with  a  weaker  constraint,  that  of  an  average  RMS  bandwidth 
bounded  by  Bo,  and  establishing  (constructively)  the  emstence  of  a 
solution  to  the  latter  problem  that  conusts  of  signab  with  equal  RMS 
handwidths.  These  signab  will  therefore  also  be  a  solution  to  the 
maximum  RMS  bandmdth  problem,  for  if  there  were  another  signal 
set  meeting  the  latter  constraint  and  having  a  higher  determinant, 
this  set  would  supplant  the  original  solution  to  the  average  RMS 
bandwidth  problem. 

The  main  result  in  [2]  provides  a  closed-form  solution  for  the  set 
of  signab  that  achieve  the  minimum  average  RMS  bandwidth  among 
all  signal  sets  that  have  a  given  correlation  matrix  R.  The  signab 
have  the  fdlowing  form: 

s*(»)  =  (2/T)>/*£cs,sinC7>t/T),  0<t<r,  1  <  fc  <  i.,  (1) 

j-i 

where  T  b  the  duration  of  the  signal  and  n  b  the  number  of  sig¬ 
nab.  The  solution  for  the  coefficient  matrix  C  =  {ctj}  b  given  by 
C  s  AV^V,  where  A  =diag(A|,...,A,i),  >  A,-.(.|  are  the  ordered 

eigenvalues  of  R  and  V  b  the  matrix  of  eigenvectors  of  R  in  its  spec¬ 
tral  decomposition,  R  =  VAV^.  Furthermore,  these  signab  have 
individual  and  average  RMS  bandwidths  ^ven  by 

R,  =  (2T)-*(VAnV^)«  and  R  =  (2r)-’n-*trace(An),  (2) 

where  n  b  defined  to  be  a  diagonal  matrix  with  !!«  =  i*. 

In  light  of  the  result  of  [2]  stated  above,  the  average  RMS  band¬ 
width  problem  can  be  sqnivaleatly  posed  as  that  of  finding  a  non- 
nagative  definite  unit-diagonal  condition  matrix  R  with  maximum 
determinant  under  the  constraint  that  the  corresponding  minimum 


average  bandwidth  given  by  (2)  b  bounded  by  Rq.  The  problem  can 
be  formulated  entirely  in  terms  of  the  n  eigenvalues  of  the  correlation 
matrix  since  det(R)  =  det(A),  the  bandwidth  constraint  b  expressed 
only  in  terms  of  A,  and  the  constraints  on  R  can  also  be  represented 
by  positivity  ud  trace  constraints  on  A.  The  average  RMS  problem 
b  now  expressed  as: 


max 

det(A) 

(3) 

subject  to 

Ai  >  Aj+i  >  0, 

(4) 

trace(A)  =  n. 

(5) 

trace(AII)  <  h  with  b  =  Ro(2T)*n. 

(6) 

Non-trivial  solutions 

for  A  exist  when  6  b  restricted  to  the 

range 

(n,  n(n  +  1)(2b  -f  l)/6). 

The  first  result  in  thb  paper  simplifies  the  nonlinear  optimization 
problem  (3)  -  (6)  by  showing  that  the  ordering  constraint  in  (4)  can  be 
relaxed  to  only  a  positivity  constraint  and  the  inequality  constraint 
in  (6)  can  be  changed  to  the  corresponding  equality  constraint.  The 
new  constraints  are  thus  A  >  0,  trace(A)  =  n  and  trace(AI[)  =  h.  A 
proof  of  thb  result  involves  showing  the  suboptimality  of  any  set  of 
eigenvalues  that  either  violates  the  ordered  property  or  faib  to  use  aU 
the  bandwidth  i^ven  in  the  constraint.  The  problem  b  then  sidved 
using  the  Lagrange  multiplier  technique  which  invedves  finding  the 
Lagrange  multipliers  numerically  as  a  solntion  to  a  set  of  nonlinear 
equations.  These  equations  are  then  salved  by  standard  numerical 
techniques. 

The  next  result  of  the  paper  gives  a  constructive  procedure  for 
finding  a  positive  unit-diagonal  definite  matrix  R  of  size  n  with  the 
specified  optimal  eigenvalues.  The  procedure  invifives  finding  the  op¬ 
timal  correlation  matrix  by  starting  with  =  A  and  performing 
at  most  n  -  1  rotations  given  by,  such  that 

with  V  =  the  correlation  matrix  R  =  VAV^  has 

unit-diagonal  dements  and  in  addition,  the  matrix  VAIIV^  has  equal 
diagonal  dements.  Thb  ensures  equal  bandrvidths  of  the  optimal  sig¬ 
nal  set,  thereby  solving  the  maximum  RMS  problem  dmultaneoudy. 
The  problem  of  finding  the  optimd  signal  set  b  now  identical  to  the 
problem  solved  in  [2]. 

Finally,  if  n  b  such  that  a  Hadamard  matrix  of  dimendon  n  ex- 
bts,  a  single  unitary  (Hadamard)  transformation  b  also  shown  to 
yidd  a  signal  set  that  solves  the  maximum  RMS  bandwidth  problem. 
Hadamard  matrices  exbt  for  aU  dimensions  which  are  a  power  of  2 
but  also  many  others,  (cf.  [3]). 

References 

[1]  Verdfi,  S.,  "Capadty  te^n  of  Gansdan  CDMA  channeb:  the 
symbd-synchronons  case,”  Proc.  of  the  Twentf-foorth  Allerton 
Conference  on  Communication,  Controt  and  Comjmting,  AUer- 
ton,  U,  pp.  1025-1034,  October,  1985. 

[2]  NuttaU,  A.H.,  “Minimum  tms  bandwidth  of  M  time-limited  sig¬ 
nab  with  specified  code  or  corrdation  matrix,”  IEEE  TVwisae- 
tione  on  Information  Theoiy,  vol.  IT-14,  pp.  699-707,  September 
1968. 

[31  HaU  Jr.,  MarshaU,  ComUnatorial  Tkeorg,  Blalsddl  Pnblbhing 
Company,  Waltham,  MA,  1967. 


374 


ON  ACHIEVABLE  INTER-USER  ORTHOGONALITY  FOR  MULTI-USER 
COMMUNICATION  SYSTEMS  IN  MULTIPATH  FADING  ENVIRONMENTS 


JURG  RUPRECHT 

Swiss  PTT  General  Directorate,  R&D,  Mobile  Communications  VD  2,  3000  Berne  29,  Switzerland 


Abstract  —  This  paper  proposes  and  compares  three  broad¬ 
band  modulation/demodulation  schemes  for  use  in  a  multi¬ 
path  fading  environment.  They  are  all  based  on  COMA,  are 
of  a  broadband  nature  in  order  to  combat  frequency  selective 
fading  and  achieve  a  certain  degree  of  orthogonality  in  order 
to  enhance  spectral  efficiency. 

I.  Introduction 

In  multi-user  communication  systems  where  the  users  access  the  same 
channel,  the  optimum  receiver  is  in  the  general  case  a  joint  detection 
receiver,  i.e.,  the  data  symbols  of  all  users  have  to  be  detected  jointly. 
For  a  large  number  of  users,  this  receiver  is  very  complex  and  in  most 
cases  practically  not  implementable.  To  avoid  joint  detection,  orthog¬ 
onal  modulation/demodulation  schemes  are  desirable  where  a  single 
user  detector  has  the  same  performance  independently  of  the  number 
of  active  users. 

In  an  AWGN  environment,  FDMA,  TDMA  and,  for  proper  spreading 
code  choices,  CDMA  are  examples  of  such  orthogonal  modulation/de¬ 
modulation  schemes.  When  properly  implemented,  FDMA  and  TDMA 
keep  inter-user  orthogonality  even  in  multipath  environments,  whereas 
conventional  CDMA  causes  inter-user  interference  due  to  the  lack  of 
a  sufficient  number  of  orthogonal  spreading  codes.  In  order  to  over¬ 
come  the  spectral  nulls  of  multipath  fading  channels,  mainly  wideband 
communication  systems  based  on  TDMA  and  CDMA  are  currently  con¬ 
sidered  for  future  mobile  communications  systems.  CDMA  offers  many 
advantages,  but  suffers  from  the  above-mentioned  inter-user  interfer¬ 
ence  in  multipath  environment  when  a  single  user  detector  is  applied. 
This  presentation  suggests  and  compares  several  approaches  to  orthog- 
onalize  the  users  in  a  synchronized  CDMA  system  in  a  multipath  fading 
environment. 

II.  Model 

A  discrete-time  multi-user  communication  system  as  shown  in  Fig¬ 
ure  1  is  considered.  Each  user  k  (A:  =  0, 1,...A’-1)  modulates  its  in¬ 
formation  sequence  6*[ .  ]  with  symbol  rate  Rs  by  converting  it  into  the 
corresponding  transmission  sequence  x*[ .  ]  with  chip  rate  fie-  The  total 
transmission  sequence  i[ .  ]  =  *hen  transmitted  through  a 

common  noiseless  channel  with  impulse  response  /i[ .  ]  and  is  further  dis¬ 
torted  by  a  noise  sequence  z[.].  The  corresponding  received  sequence 
y[.]  =  /i[.]*i[.]-t-z[.]  then  serves  each  demodulator  as  input  for 
its  estimate  6;b[.]  of  &*[.].  The  goal  is  to  design  orthogonal  modula¬ 
tion/demodulation  schemes,  i.e.,  the  demodulator  performance  is  the 
same  independently  of  the  number  k<K  of  active  users. 

III.  Conventional  Access  Techniques 

In  an  AWGN  environment,  FDMA  (where  the  users  are  assigned 
non-overlapping  spectra),  TDMA  (where  the  users  are  assigned  non- 


Figure  1:  Multi-user  communication  system 


overlapping  time  instants)  and  CDMA  (where  the  users  are  assigned 
orthogonal  spreading  codes)  provide  orthogonal  modulation/demodu¬ 
lation  schemes.  In  a  multipath  fading  environment,  these  access  tech¬ 
niques  perform  as  follows: 

•  FDMA  still  provides  an  c-thogonal  modulation/demodulation 
scheme.  The  spectral  nulls,  however,  degrade  the  FDMA  perfor¬ 
mance  significantly  when  not  combined  with  frequency  hopping 
and/or  interleaving. 

•  TDMA  still  provides  an  orthogonal  modulation/demodulation 
scheme,  if  the  users  are  separated  in  time  such  that  no  inter-user 
interference  occurs.  The  system  bandwidth  is  limited  because  of 
equalizer  complexity. 

•  Conventional  CDMA  does  no  longer  provide  an  orthogonal  mo¬ 
dulation/demodulation  scheme,  and  the  Qualcomm  approach  [2] 
provides  only  orthogonality  on  every  single  channel  path,  such 
that  the  system  becomes  interference  limited  for  single  user  de¬ 
tectors. 

Thus,  FDMA  and  TDMA  provide  orthogonal  modulation/demodula- 
tion  schemes  even  in  multipath  fading  environments,  but  yield  the 
stated  disadvantages.  On  the  other  hand,  CDMA  offers  the  above- 
mentioned  advantages. 

IV.  Proposed  Access  Techniques 

We  therefore  suggest  and  compare  the  following  approaches  to  or- 
thogonalize  the  users  in  a  sCDMA  system  in  a  multipath  fading  envi¬ 
ronment: 

•  Qode  frequency  division  multiple  access  (CFDMA)  is  a  new  ac¬ 
cessing  scheme.  It  spreads  an  narrowband  FDMA  system  with  a 
spreading  sequence  in  a  CDMA  fashion.  CFDMA  is  orthogonal 
on  each  channel  path,  but  different  channel  paths  interfere. 

•  £ode  lime  division  multiple  access  (CTDMA)  [1]  spreads  a  con¬ 
ventional  TDMA  system  in  a  CDMA  fashion,  where  all  users 
spread  with  the  same  sequence.  An  inverse  filter  receiver  garan- 
tees  an  orthogonal  modulation/demodulation  scheme  even  in  a 
multipath  fading  environment. 

•  Qode  code  division  multiple  access  (CCDMA),  as  proposed  by 
Qualcomm  as  CDMA  [2],  scrambles  a  CDMA  system  (spread 
by  orthogonal  Walsh  codes)  by  an  additional  PN  sequence.  As 
CFDMA,  CCDMA  is  orthogonal  on  each  channel  path,  but  dif¬ 
ferent  chsmnel  paths  interfere. 

V.  Conclusion 

CTDMA  is  the  only  scheme  that  keeps  full  modulation/demodula¬ 
tion  orthogonality.  On  the  other  hand,  the  number  of  possible  users  is 
limited  by  the  maximum  excess  delay  of  the  channel  impulse  response. 
In  CFDMA  and  CCDMA,  only  partial  modulation/demodulation  or¬ 
thogonality  can  be  provided,  but  user  capacity  of  the  system  does  no 
longer  depend  as  severely  on  the  maximum  excess  delay  of  the  channel. 
These  accessing  schemes  are  proposed  and  compared  in  performance 
for  different  situations. 

Literature 

[1]  Ruprecht,  J.,  Neeser,  F.D.,  Hufschmid,  M..  “Code  time  division  multiple 

access:  An  indoor  cellular  system”.  Proceedings  of  the  12nd  IEEE  VeAic- 

sfar  Technology  Conference,  pp.  736-739,  Denver,  1992. 

[2]  Salmasi,  A.,  Gilhausen,  K.S.,  “On  the  system  design  of  code  division  mul¬ 
tiple  access  (CDMA)  applied  to  digital  and  personal  communication  net¬ 
works",  Proceedings  of  the  11th  IEEE  Vehicular  Technology  Conference, 
pp.  57-62,  St.  Louis.  1991. 


CHANNEL  CODING  FOR  ASYNCHRONOUS  FIBEROPTIC 
CDMA  COMMUNICATIONS 


M.  Dal*  R-  Gagliardi 

TASC  Corp.  Unlv  of  Southarn  Calif 

Raaton,  VA  Los  Angolas,  CA 


Code  Division  Multiple  Access  (CDMA)  has  been 
proposed  as  a  possible  format  for  fiberoptic  networks.  The 
baseline  CDMA  uses  on-off  keying  (OOK)  of  binary  data 
with  a  unique  coded  pulse  sequence  transmitted  for  each 
on-bit.  Multiple  accessing  is  achieved  by  having  multiple 
sources,  each  with  its  own  code  sequence,  superimpose 
their  transmissions  over  a  common  fiber.  The  fibers  can 
then  be  interconnected  via  STAR  or  other  fiber  systems  to 
form  the  distribution  network  Data  bits  are  separated  out  at 
a  receiving  terminal  by  recognizing  (correlating)  the  proper 
sequence  of  the  desired  source.  Pulse  code  sequences 
can  be  passively  generated  from  an  initial  OOK  laser  pulse 
by  serial  or  parallel  delay  lines  ,  and  sequence  correlation 
can  be  achieved  optically  by  corresponding  matched  delay 
lines.  After  correlating  the  desired  sequence  to  a  peak 
value,  photodetection  followed  by  threshold  comparison 
can  be  used  to  detect  the  presence  or  absence  of  each  bit. 
Minimal  interference  multiple  accessing  is  achieved  by 
using  only  sets  of  pulse  code  sequences  that  have  low 
pairwise  crosscorrelations.  Optical  CDMA  has  the 
advantage  of  permitting  completely  asynchronous 
transmitters,  relatively  simple,  off-the-shelf  laser  sources, 
standard  photodetectors,  and  improved  power  levels  due 
to  the  laser  pulsing.  In  addition,  pulsed  CDMA  combines 
the  higher  speeds  of  optical  signals  with  the  more 
developed  electronic  processing  to  provide  maximum 
performance  efficiencies  in  converting  digital  data  to  optical 
transmission. 

A  prime  disadvantage  with  CDMA  is  the  sacrifice  in 
per-channel  data  rate  (relative  to  the  speeds  available  in 
the  laser  Itself)  that  occurs  in  the  insertion  of  code 
addressing.  Another  important  CDMA  concern  is  the 
development  of  digital  crosstalk  between  channels  when 
multiplexing  many  simultaneous  sources.  Channel 
crosstalk  is  the  ultimate  limit  in  link  performance,  and 
produces  an  asymptotic  floor  to  the  error  probability  (PE) 
that  can  only  be  reduced  by  slowing  the  data  rate  with 
longer  and  higher  weight  CDMA  sequences  .  This  raises 
the  question  of  whether  external  channel  coding  can  be 
more  effective  in  reducing  the  PE  floor. 

In  this  paper  we  consider  the  use  of  external  channel 
coding  in  the  form  of  either  forward  error  correction  or 


modified  waveform  encoding,  to  aid  in  mitigating  this 
crosstalk  buildup  and  produce  more  efficient  individual 
channel  performance.  While  the  advantages  of  channel 
coding  are  well  known  for  the  classical  Gaussian  noise 
channel,  the  application  to  the  optical  CDMA  crosstalk 
channel  is  somewhat  diverse,  and  care  must  be  used 
inserting  commonly  accepted  coding  ’gains*. 

The  channel  coding  is  applied  directly  to  the  CDMA 
fiber  links.  The  channel  coder  converts  the  data  bits  to 
binary  synbols,  which  are  then  sent  over  the  fiber  link  as 
OOK  symbols  encoded  with  the  transmitter  code  sequence. 
Forward  error  correction  in  the  form  of  Reed-Solomon  (RS) 
block  codes  and  convolutional  coding  (CC)  were 
considered,  it  was  assumed  that  all  systems  use  the  same 
laser  pulse  width,  number  of  transmitters,  and  channel  data 
rate.  The  CDMA  code  sequence  were  adjusted  to 
accommodate  each  type  of  coding.  Figure  1  and  2  show 
example  results  for  RS  and  CC,  indicating  the  reduction  in 
the  PE  crosstalk  floor  from  the  uncoded  case,  as  a  function 
of  the  CDMA  sequence  weight. 

in  conclusion,we  have  shown  that  channel  coding  can 
indeed  be  more  effective  in  reducing  the  PE  floor.  Both 
Reed-Solomon  block  codes  and  convolutional  codes  were 
considered,  with  both  showing  improved  PE  performance 
over  the  uncoded  OOK-CDMA  system  at  the  same 
information  bit  rate.  The  convolutional  codes  tended  to 
produce  lower  PE  floors  for  the  lower  code  weight  values 
and  hence  would  tend  have  the  highest  network  capacities. 
This  result  is  important  since  the  channel  coding  is  applied 
with  electronic  hardware  external  to  the  optics.  Hence  the 
channel  coding  should  have  limited  impact  on  the  overall 
system  cost. 

The  use  of  PPM  as  a  channel  coding  technique  was  also 
considered,  since  is  also  reduces  the  PE  floor  at  the 
expense  of  data  rate.  It  was  shown  that  the  PPM  system 
was  not  as  effective  as  the  uncoded  and  error  correction 
systems  at  the  same  data  rate.  Furthermore  PPM  requires  a 
modification  of  the  optical  encoding  and  decoding 
processing  from  the  standard  OOK-CDMA  format. 
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A  multi-dimensional  signaling  technique  is  described  for  use 
in  asynchronous  multiple  access  laser  communications.  This 
technique  is  bcised  on  interferometric  signaling  and  can  be  thought 
of  as  temporal  and  spatial  coding  of  light.  Transmitter  imple¬ 
mentation  requires  a  coherent  source  (a  laser),  signal  modula¬ 
tion  electronics,  and  some  special  optics.  Spatial  modulation 
is  accomplished  via  sin  aperture  populated  with  liquid  crystal 
(LC)  modulators  with  different  users  having  different  LC  dis¬ 
tributions.  These  distributions  are  chosen  to  yield  low  cross- 
correlation  between  the  (Fraunhofer)  diffraction  patterns.  The 
large  spatial  bandwidth  (e.g.,  10^  pixels)  of  each  laser  trans¬ 
mitter  aperture  is  utilized  for  user  coding,  while  temporal  cod¬ 
ing  is  used  for  information  signals.  Signal  recovery  is  based 
on  incoherent  optical  detection,  spatial  sampling,  and  electronic 
or  optical  matched  filtering  of  the  received  optical  beam  Fres¬ 
nel/Fraunhofer  diffraction  pattern.  With  electronic  filtering,  low 
to  medium  (e.g.,  3  Mbps)  data  rates  can  be  achieved.  With  a 
lenslet  array-based  incoherent  optical  correlator,  up  to  100  Mbps 
data  rates  can  be  tolerated. 

Assume  the  liquid  crystal  modulators  are  evenly  placed  on 
a  circular  plane  of  radius  R.  For  a  laser  with  angular  frequency 
w  it  can  be  shown  that  the  light  intensity  I(x,y)  at  position 
(x,y)  of  the  receiver  plane  at  a  distance  L  from  the  transmitter 
is  approximately  given  by 


Hx,y)  =  51  i  7“ 

p=i  ,=i 

[(cos  Op  —  cos  9,)  X  +  (sin  Op  —  sin  fl,)  y|}  , 


where  c  is  the  speed  of  light,  Oi  =  2xi/N  is  the  angular  spacing 
of  the  liquid  crystal  modulators,  and  a,  is  a  phase  modulation 
equal  to  -i-1  or  -1.  In  general,  q,  is  a  complex  valued  code  symbol 

Oi  =  r.exp{j 

Vi  being  the  amplitude  modulation  and  (t>i  is  the  phase  modula¬ 
tion  of  symbol  i;  this  yields  a  more  elaborate  expression  for  the 
received  intensity.  A  codeword  (or  signaling  mask)  for  user  k  is 
given  by 

Ak  =  (ai(k),  ...  ,aN(k)), 
and  the  multiple  access  codebook  is  given  by 

A  =  {Ai,  ...  ,  Au}, 

where  N  can  be  thought  of  as  the  block  length  of  the  code, 
and  U  is  the  number  of  users.  As  in  sequences  used  in  spread 
spectrum  applications,  A  is  designed  to  have  codewords  with 
low  cross-correlations  and  auto-correlations. 

At  the  end  of  each  signaling  interval,  a  2-D  correlator  in  the 
receiver  crosscorrelates  the  sampled  intensity  with  stored  user 
specific  intensity  functions  to  decide  what  transmitters  are  on. 
The  capability  of  this  system  to  discriminate  among  multiple 
users  is  demonstrated,  and  preliminary  results  on  the  design  of 
A  is  shown. 
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Abstract 

An  optical  orlhogotiai  code  (OOC)  is  a  collection  of  binary  set^uences  with 
good  auto-  and  cross-correlation  properties;  they  were  defined  by  Salehi  and 
others  as  a  means  of  obtaining  codj  division  multiple  access  (CDMA)  on  fiber 
optic  networks.  Up  to  now  all  work  on  OOC’s  have  assumed  that  the  weight 
of  each  codeword  is  the  same.  In  this  paper  we  develop  bounds  on  the  size 
of  OOC  s  when  this  assumption  is  removed.  In  addition,  we  demonstrate 
construction  techniques  for  building  such  “variable  weight”  OOC’s.  The  re¬ 
sults  demonstrate  that  it  is  possible  to  assign  codewords  with  different  weights 
among  the  users.  Changing  the  weight  of  a  user’s  signature  sequence  affects 
that  user’s  performance;  therefore  this  approach  is  useful  for  CDMA  fiber 
optic  networks  with  multiple  performance  requIrerrKnts  among  the  users. 

Summary 


a  similar  vein,  the  synchronization  “figure  of  merit”  is  —  A,(j))/Ac  for 

multiple-access  synchronization  and  Wt{i)  —  for  single-user  synchroniza¬ 
tion.  (Here,  Av(t)  is  the  auto-correlation  associated  with  codeword’s  of  weight 

FVom  above,  we  can  see  that  the  weight  of  a  user’s  signature  sequence 
will  strongly  affect  that  user’s  performance.  Therefore,  by  assigning  code¬ 
words  with  different  weights  we  are  d^le  to  accomodate  multiple  performance 
requirements  among  the  network’s  users. 

2.  New  Results 

Thtt  paper  will  detail  new  techniques  for  analyzing  variable  weight  OOC’s 
and  related  sequences.  New  methods  of  constructing  such  codes  have  been 
found  and  new  bounds  on  the  size  of  such  codes  have  been  derived. 

2.1.  A  New  Bound 


1.  Background  and  Motivation 

As  the  demand  for  personal  communication  services  continues  to  rise,  mul¬ 
tiple  access  techniques  become  ever  more  important.  Code-division  multiple 
access  (CDMA)  is  a  kind  of  spread  spectrum  technology  that  enables  many 
users  to  share  the  same  channel  without  interference  by  employing  a  unique 
signature  sequence  to  distinguish  different  users’  transmission. 

Optical  orthogonal  codes  (OOC’s)  were  introduced  by  Salehi  ei.  a/ (1-2)  as 
a  means  of  obtaining  code  division  multiple  access  among  asynchronous  users 
on  fiber  optic  networks.  An  OOC  is  a  family  of  (0,1)  sequences  with  good 
auto-  and  cr»..'«-correlation  properties,  and  a  variable  weight  OOC  is  an  OOC 
in  which  the  weight  of  each  codeword  is  not  constant  over  the  code. 

Throughout  this  summary,  we  use  W,  t,  and  Q,  to  denote  the  sets  {ujq.uji, 

■  ..ujp},  {Ao,Ai.  ....  Ap),  and  {70.  .  ■  •  •  respectively. 

Definition:  A"  {n,W,L,X<^Q)  variable  weight  optica!  orthogonal  code  C  is 
a  collection  of  binary  n-tuples  such  that  the  following  three  properties  hold; 

•  (Weight  Distributution)  Every  n-tuple  in  C  has  a  Hamming  weight 
contain^'d  in  the  set  W\  furthermore,  there  are  exactly  ■  \C\  codeword 
of  weight  u.  -  i.e.,  qi  indicates  the  fraction  of  codewords  of  weight  u»i. 

•  (Auto-correlation  Property)  For  a^>  x  =  [xq,  ri , . . . ,  Xn-i)  €  C  with 

Hamming  weight  Wi  ^  A  and  dJiy  integer  r,  0  <  <  n. 


n-i 


t  =  0 

•  (Cross-correlation  Property)  For  any  x  =  (xo,Xi, . . .  .Xn-i)  €  C  and 
any  y  =  [yo,  j/i,  . . t/n-i]  €  C  such  that  x  ^  y  and  any  integer  r, 

»i-i 

^  ^  S  Ar- 

isO 

Note:  OOC’s  were  defined  in  terms  of  periodic  correlation;  thus  the 
addition  in  the  subscripts  above  -  denoted  “0”  ~  is  aP  modulo-n. 

The  definition  of  a  variable  weight  OOC  is  a  gen^  alization  of  the  defintion 
for  OOC  given  in  [1-2].  The  use  of  OOC’s  for  multiple  access  is  described  in 

[1-3]. 

•  The  auto-cc  rrelation  constraint  guarantees  that  each  signature  sequence 
is  unlike  cyclic  shifts  of  itself.  This  property  is  used  to  enable  the  re¬ 
ceiver  to  obtain  synchronization. 

•  The  cross-correlation  constraint  guarantees  that  each  signature  se¬ 
quence  is  unlike  cyclic  shifts  of  the  other  signature  sequences.  This 
property  is  used  to  enable  the  receiver  to  estimate  its  message  in  the 
presence  of  interference  from  other  users. 

A  reasonable  “figure  of  merit”  for  a  code  is  the  number  of  interfering  u»ers 
necessary  to  cause  the  code  to  fail.  For  instance,  assume  synchronization  has 
been  achieved;  then  the  only  errors  the  1**  user  can  make  are  0  — •  I  errors, 
and  they  can  only  occur  when  enough  other  users  interfere  to  make  the  cor¬ 
relation  at  the  I**  receiver  exceed  (Here,  is  the  Hamming  weight 

of  user’s  codeword  ).  Since  each  of  those  other  users  can  contribute  at 
most  Af  to  the  correlation,  the  performance  “figure  of  merit”  is  u>,fj)/Ac.  In 


Define  ^(n,  W,  L,  A^ ,  Q)  to  be  the  cardinality  of  an  optimal  variable  weight 
optical  orthogonal  code  with  the  given  parameters  —  i.e., 

^(n,  W,  L,  Af,Q)  ~  max{|  C  j;  C  is  an  (n,  W.  L,Xc.Q)  variable  weight  OOC}. 

We  have  derived  a  new  upper  bound  on  ^(n,  W,  L,  A^,  Q)  for  Xa,  >  Ac 

iK€L). 

Theorem:  Let  Aa,  >  Ac  (A^,  6  L).  Then 


^(n.W.L,Ac,a)< 


(n-l)(n-2)...(n-Ac) 

p 

tsO 


We  note  at  this  point  that  the  technique  used  to  prove  this  theorem  is 
immediately  applicable  to  binary  codes  employed  for  CDMA  when  the  auto¬ 
correlation  and/or  the  cross-correlation  constraints  are  specified  in  terms  of 
aperiodic  correlation  as  well. 


2.2.  New  Constructions 

Several  new  approaches  for  constructing  variable  weight  OOC’s  have  been 
found.  Among  them: 

•  We  can  use  the  balanced  incomplete  block  design  technique  [3]  to  con¬ 
struct  (n,  {u/  -i-  1,  w),  {2, 2),  1,Q),  (n,  {2u;,  u/},  {2,2}.  1,Q),  (n,{2u'-f- 
I,ti;}.{2.2},l,Q),  (n,{2«/-f-l.ii/+l),{2,2),l.(?),and(n.{2u  +  ].u^-j- 
l,  uj},  {2,2,2),  1,  Q)  variable  weight  OOCs  for  even  u;. 

{1,1}.1,Q).  (n.  {2u;,ii^},{2.I},i.Q).and  (n,  {2il  1  l.u;},  {2.I}.1,Q) 
variable  weight  OOCs  for  odd  w.  Among  these  constructions,  the  car¬ 
dinality  of  the  (n,  1,  u/},  {1, 1),  1,  Q)  variable  weight  OOC  reaches 

the  upper  bound  of  the  last  section;  hence  it  is  optimal. 

•  We  have  generalized  the  recursive  construction  method  01  [\1  to  '“in¬ 
struct  variable  weight  OOC’s.  The  recursive  construction  technique 
uses  the  codes  which  are  constructed  by  previous  techniques  to  pro¬ 
vide  infinite  families  of  codes. 
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A  system  is  proposed  and  analyzed  which  fully  utilizes  the  band¬ 
width  available  in  the  optical  medium.  This  optical  code-division  mul¬ 
tiple-access  system  illustrated  in  Figure  1  has  its  signature  sequences 
on-oif  encoded  on  the  frequency  bands  of  the  optical  beam.  Two  sub- 
optimal  detector  options  for  multi-user  operation  are  considered:  an 
optimized  single-user  detector  and  a  multistage  detector.  The  anal¬ 
ysis  is  performed  using  a  large  deviation  theory  approximation  and 
further  verified  by  simulation.  The  advantages  of  this  system  over 
the  conventional  time-encoded  system  include  the  larger  number  of 
low  crosscorrelation  sequences  available  and  the  implementation  of  ef¬ 
ficient  decoders  for  low  error  probability  detection. 

1.  Summary 

Two  of  the  possible  optical  sources  with  bandwidths  broad  enough 
are:  a  coherent  mode-locked  laser  source  and  an  incoherent  source, 
such  as  a  superfluorescent  fiber  source.  The  mode-locked  laser  is  more 
difficult  to  implement  than  the  incoherent  source  yet  less  noisy  due 
to  its  Poisson  photoelectron  count  statistics,  compared  to  the  doubly 
stochastic  Poisson  statistics  of  the  incoherent  source.  This  paper  con¬ 
centrates  on  the  laser  based  system,  which  can  be  considered  a  best 
case  yielding  a  lower  error  probability,  mentioning  the  differences  with 
the  incoherent  source  system  when  applicable. 

The  novelty  of  this  system  is  in  the  CDMA  encoding,  which  is 
achieved  by  spatially  spreading  the  optical  spectrum  and  on-off  modu¬ 
lating  the  resulting  frequency  bands.  The  spreading  is  accomplished  by 
the  use  of  a  diffraction  grating  and  the  encoding  is  done  via  an  ampli¬ 
tude  mask.  The  system  is  composed  of  K  users,  labelled  k  = 
each  transmitting  continuous  binary  information  6^^*  €  {0,1},  t  = 

...,-1,0,1 .  The  symbol  6^^*  on-off  modulates  the  optical  beam, 

which  is  then  encoded  by  a  mask  representing  the  sequence  Akj  € 
{0,1}, j  =  l,...,y.  The  mask  allows  frequency  bands  corresponding 
to  Aij  =  1  to  pass  and  blocks  the  other  frequencies.  The  total  inte¬ 
grated  intensity  of  the  modulated  signal  in  one  frequency  band  over 
the  bit  period  (  0,  T  J  is 

-•r  =  r\E,^{t)\^dt-k 

Jo 

jTj  [  JT-r,  Jo 


case  can  be  approximated  by  a  negative  binomial  distribution.  For  the 
laser  system,  letting  Akj{h,h)  =  //’  “d  letting  the  inter¬ 
ference  in  frequency  band  j  be  Ij  =  -  r*,r)-h 

4t'^Atj(0,T—  ri)j  +  Ai,  the  optimized  single-user  detector  has  the 

.r-.,  +  > 

.S'*!  I  ;  ’■ 

with  p/j(A)  a  convolution  of  all  possible  interference  distributions,  and 
the  multi-stage  detector  has  the  form 


E  =  T. 

j=i  V  h  /  c 


where  Ij  is  an  estimate  of  this  interference  based  on  the  previous  stage 
of  detection  and  7  is  the  threshold. 

The  primary  performance  analysis  toed  is  large  deviation  theory 
[1],  through  which  the  probability  of  error  is  derived  by  computing 
the  expected  value  over  uniformly  distributed  delays  of  the  syn¬ 
chronous  OCDMA  system  in  [1].  Using  this  scheme,  an  approximation 
to  the  performance  is  obtained,  which  is  verified  by  simulation  to  be 
very  accurate  for  all  sequences  considered.  The  coherent  laser  optical 
source  case  is  analyzed,  considering  a  correlation  detector,  an  opti¬ 
mized  single-user  detector,  and  a  multistage  detector.  The  advantage 
gained  by  the  increase  in  code  size  is  quantified  and  compared  to  time- 
encoded  systems,  as  illustrated  in  Figure  2.  More  than  twice  as  many 
users  can  be  supported  using  spectral  encoding  than  time  encoding  if 
a  multistage  detector  is  employed.  Yet  as  explained  before,  at  such 
bandwidths,  the  time-encoded  system  can  only  employ  a  correlation 
detector,  whose  performance  is  shown  to  be  unacceptable  for  a  large 
number  of  users. 
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where  Ek,j{t)  is  the  optical  field  of  user  k  in  frequency  band  j  and  r;^  is 
its  delay  with  respect  to  the  desired  user,  user  one.  The  number  of  low 
crosscorrelation  encoding  sequences  available  to  the  spectral  amplitude 
encoded  system  is  a  factor  of  J  more  than  for  the  time-encoded  system, 
since  the  auto-correlation  constraint  can  be  completely  relaxed.  The 
sequences  of  interest  for  the  frequency  encoded  system  are  codes  with 
fixed  weight  w  and  minimum  distance  2(w- 1),  for  a  maximum  overlap 
of  one  frequency  band. 

The  detection  is  based  on  spreading  the  signal  spatially  exactly 
as  in  the  encoding  stage,  and  then  detecting  the  individual  frequency 
bands  using  a  photodetector  array  integrating  over  the  entire  bit  in¬ 
terval.  One  of  two  proposed  detection  algorithms  then  follows:  an 
optimized  single-user  detector  or  a  multistage  detector  (Ij.  The  degra¬ 
dations  to  the  system  considered  in  this  model  are  the  multiple  access 
interference,  the  noise  due  to  photoelectron  statistics,  and  the  dark 
current  noise.  The  photoelectron  counts  of  the  J  frequency  bands,  la¬ 
belled  Nj,  j  =  I, ....  y,  are  Poisson  distributed  with  parameter  for 
the  laser  based  system.  The  statistics  for  a  truly  incoherent  source, 
i.e.,  thermal  light,  depend  on  the  shape  of  the  spectrum  but  in  this 

^Tkis  work  is  sspported  is  put  by  NASA/ Johnson  Space  Center  under  grant 
NGT.5<M47  and  by  the  Advanced  Technology  Program  of  the  Teaas  Higher  pdo- 
cation  Coordinating  Boud  under  Grant  OOJSnt-ulg 


Phoiodettctor  Amy  tern  using  correlation  detector,  (b) 

Figure  1:  Optical  spectral  ampli-  ...  using  multistage  detector, 
tude  encoded  communication  sys-  (c)  Spectral  encoded  system  using 
tern.  multistage  detector 
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Deep  space  cooununication  systems  are  traditionally  designed  to 
use  a  concatenated  coding  sdieme  em|goylng  an  inner  aitd  m  outer  code 
in  order  to  obtain  reliaMe  commutdcation  at  low  SNR.  The  Consulta¬ 
tive  Committee  on  Space  Data  Systems,  CCSDS,  uses  a  concatenated 
coding  system  with  a  rate  \IZ,  memory  order  6,  convolutiotui  inner 
code  and  a  Reed  Solomon  (RS)  block  outer  code  over  GF(2S6).  EigN 
RS  words  are  organized  imo  ei^  columns  in  a  235x64  bit  data  frame. 
Before  the  data  fiame  is  transmitted,  the  data  is  tteHis  encoded  by  tows. 
On  the  receiving  end,  decoding  is  performed  in  two  steps:  the  tamer 
decoder  uses  die  maximum-likelihood  (soft)  VitetU  algorithm  and  the 
outer  decoder  uses  RS  algebraic  decoding  methods.  Paaske  has  shown 
that  the  performance  of  the  CCSDS  scheme  can  be  improved  by  provid¬ 
ing  muldple-pass  feedback  from  the  outer  to  inner  decoders  (ll-  The 
decoded  ou^ut  from  the  outer  decoder  is  used  by  the  inner  decoder  to 
pin  states  in  subeequem  decoding  passes.  As  Collins  has  shown  [2],  the 
error  correcting  performance  of  a  trellis  code  is  improved  when  reliable 
side  infonnadon  is  used  to  pin  states  during  decoding,  rouipily  gaining 
in  energy  efflciency  by  lOlogio/,  where/is  die  pinning  fraction. 

In  this  piper,  we  study  two  issues  related  to  further  improvemem 
of  the  perfonnance  of  this  concatenated  coding  system,  without  increas¬ 
ing  the  transmission  power,  the  bandwidth,  or  the  decoder  complexity. 
First,  we  investigaie  the  options  of  byte  versus  bit  interleaving  between 
the  outer  and  inner  systems.  This  has  a  subtle  effea-byte  interleaving 
is  a  better  choice  as  far  as  Reed-Solomon  decoding  goes,  since  tamer 
decoder  erron  are  not  diffosed  throughout  the  interleaving  array.  How¬ 
ever,  we  find  that  for  a  given  pinning  fraction,  it  is  mote  advantageous 
to  scatter  the  side-infotmation  unifonnly  in  time,  i.e.,  use  bit  interleav¬ 
ing.  At  the  bottom  litK,  however,  we  find  that  byte  interleaving  is 
slightly  superior. 

Second,  we  investigate  the  design  of  RS  outer  coding  with  a  vari¬ 
able  rate  in  various  columns,  keeping  the  overall  frame  teduivlancy  con¬ 
stant  Thus  some  codeword  columtu  have  less  parity  than  mminal, 
while  others  have  much  greater  parity.  This  variable-rate  feature  is  no 
real  complicatirm  for  decoding  due  to  the  highly  flexiUe  decoding  algo¬ 
rithms  for  RS  codes.  Driven  by  a  commem  of  Paaske  that  the  multi¬ 
pass  scheme  succeeds  with  high  probability  if  one  column  is  successftd, 
we  design  one  column  with  high  redundancy  (but  not  so  high  as  to 
exhaust  the  available  parity  symbote),  then  incrementaUy  design  the 
redundancy  of  other  columns  to  provide  high  probability  of  success 
given  that  the  previous  columns  have  been  correctly  decoded.  Optimiz¬ 
ing  this  profile  has  been  dotre  experimentally  by  sbnulatittg  the  entire 
system.  At  £*/No* U  dB  we  suggest  that  a  parity  proAle  across  the 
eight  columns  of  (2628.32.36,100.8^,22)  is  near-opUmal  in  terms  of 
maximizing  probability  of  correct  decoding  of  a  frame.  At2dBamore 
uniform  profile  (2628,30,32.90.16,1420)  seems  preferable.  We  note 
the  asymmetry  of  the  profile  accrues  from  the  faa  that  pinning  has  an 
asymrnettlc  effect  on  cleaning  out  errors  in  the  Viterbi  decoding  pro¬ 
cess;  known-state  information  prunes  away  more  errors  'ahead  or  the 


decoder  than  behind  it  in  time.  The  improvement  in  link  effidency  widi 
this  'optimized'  scheme  is  about  0.4  dB,  dearly  not  a  huge  amoum,  but 
fractions  of  a  decibel  are  precious  in  this  regime. 

Simulations  also  show  that  use  of  larger-than-normal  decoder 
delay  is  helpful  at  very  low  SNR  conditions.  The  usual  rule  xA  tiinmb 
(five  constraint  lengths)  suggests  30  bit  decoder  delay,  but  we  observe 
that  60  bit  delay  is  a  simple  way  to  gain  another  0.15  dB.  Finally,  relax¬ 
ing  the  restriction  on  bandwidth  can  significantly  improve  perRmnanoe 
at  low  SNR.  By  replacing  the  rate  1/2  inner  code  with  a  tale  1/4  code 
optimized  at  low  SNR  [3],  the  battdwidth  is  expanded  by  *  factor  of 
two,  but  by  keeping  the  memory  order  of  the  tamer  code  constatt,  the 
decoder  comidexity  remains  constant  and  the  required  transminion 
power  can  be  reduced.  Operation  at  E^/No^i  dB  is  feasible  in  this 
case. 
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For  many  years,  the  data  stream  sent  by  American  deep- 
space  missions  has  been  protected  by  error-correcting 
codes.  For  Galileo,  a  probe  and  orbiter  currently  en  route 
to  explore  Jupiter  and  its  moons,  NASA’s  standard 
constraint  length  7,  rate  1/2  code  is  available,  but  a 
constraint  length  15,  rate  1/4  convolutional  code  was 
developed  for  the  mission  and  a  prototype  Viterbi  decoder 
for  long  constraint  length  codes  has  been  built.  For  parts 
of  the  mission,  the  convolutional  code  was  to  be  the  inner 
code  in  a  concatenated  system  with  an  outer  (255,223) 
Reed-Solomon  code. 

Data  compression  has  been  used  in  deep  space  much 
less  than  channel  coding  for  three  reasons.  First,  source 
models  are  neither  as  developed  nor  as  simple  as  the 
deep-space  channel  model  (AWGN).  Second,  most  data 
compression  algorithms  require  substantial  complexity  for 
encoding,  and  calculations  on  a  spacecraft  are  much 
more  difficult  than  on  the  ground.  But  third  and  most 
important,  scientists  are  very  slow  to  accept  the  distortions 
that  come  with  much  compression,  when  they  are  unlikely 
to  be  able  to  gather  this  data  again  in  their  careers. 
Ga///eo’s  imaging  system  includes  the  option  for  lossless 
data  compression. 

Last  year,  during  its  trip  to  Jupiter,  Galileds  collapsible 
high-gain  antenna  was  scheduled  to  unfurl,  but  efforts  to 
open  it  have  not  been  successful  yet.  If  the  high-gain 
antenna  were  to  remain  closed,  all  communication  would 
be  via  a  low-gain  antenna.  Without  any  additional 
changes,  this  would  cause  the  achievable  data  rate  to 
drop  by  four  orders  of  magnitude.  A  small  part  of  this  loss 
is  due  to  the  fact  that  the  (15, 1/4)  encoder  is  not 
accessible  for  data  going  through  a  low-gain  antenna. 

If  the  high-gain  antenna  remains  unusable,  we  will 
change  Ga//7eo’s  error-correcting  codes  and  data 
compression  algorithm.  Of  course,  the  spacecraft 
hardware  is  completely  inaccessible,  but  at  the  data  rates 
we  are  now  considering  (about  100  bits  per  second)  a  lot 
can  be  done  in  software,  even  on  Galileo's  somewhat  old 
computers. 

Software  convolutional  encoding  would  be  easy,  except 
that  Galileds  design  requires  that  most  communication 
through  the  low-gain  antenna  must  pass  through  the  (7, 


1/2)  encoder  after  leaving  the  spacecraft's  computers. 

This  means  that  any  code  must  be  realizable  as  a 
concatenated  code,  with  a  (7, 1/2)  code  on  the  inside.  /\n 
(11,1/2)  code  concatenated  with  a  (7,1^)  code  yields  a 
(14, 1/4)  code,  and  many  of  these  (14, 1/4)  codes  perform 
nearly  as  well  as  the  best  known  (14, 1/4)  codes.  But  no 
such  code  has  taps  on  both  ends  of  all  connection 
vectors,  a  property  of  “good  codes"  and  a  requirement  to 
be  decoded  in  a  straightforward  way  by  our  hardware 
Viterbi  decoder.  Again,  we  are  saved  by  data  rate:  we 
can  decode  the  resulting  (14,  1/4)  convolutional  code  in 
software.  In  addition,  we  may  change  the  outer  Reed- 
Solomon  code  from  (255, 223)  to  a  system  with  words  of 
different  parity  in  each  interleaved  frame;  this  would  mean 
that  some  words  in  each  frame  are  almost  certain  to 
decode,  giving  more  information  about  the  state  of  the 
convolutional  encoder;  this  information  can  then  be  used 
by  a  second  Viterbi  decoder  to  “redecode"  with 
substantially  smaller  error  rate. 

Until  now,  data  compression  algorithms  for  deep-space 
have  been  limited  to  lossless  compression,  and  thus  to 
about  3.6  bits/pixel,  while  slightly  lossy  algorithms  like  the 
proposed  Joint  Photographic  Experts  Group  (JPEG) 
standard  show  almost  no  visual  degradation  at  less  than 
1  bit/pixel,  compared  to  our  original  8  bits/pixel.  Because 
the  communications  rate  with  the  low-gain  antenna  is  so 
low,  many  of  the  planned  images  would  not  be  sent  at  all, 
and  so  the  small  distortions  introduced  by  data 
compression  are  now  much  more  attractive  to  the 
scientists. 

A  JPEG  standard  8x8  Discrete  Cosine  Transform  (DCT)  is 
not  possible  within  our  constrained  memory  and 
computation  resources,  but  we  intend  to  implement  a 
similar  multiplication-free  integer  transform,  the  Integer 
Cosine  Transform  (ICT),  which  can  compress  a  typical 
planetary  image  10:1  with  an  RMS  error  of  1  (out  of  256) 
gray  level,  (peak  SNR  48  dB),  or  20:1  with  an  RMS  error 
of  2.  with  memory  requirements  of  4K  bytes  for  code  and 
7K  bytes  for  buffer,  using  32  adds  and  12  shifts  for  each  8- 
point  DCT.  This  algorithm  will  be  used  on  Galileo's 
images  if  a  low-gain  mission  is  required. 
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Abstract-  We  propose  a  modification  of  generalized  concatenated 
codes,  which  aUows  the  constraction  of  some  best  known  binary  linear 
codes  in  a  very  simple  way.  As  another  application  a«  show  that  by 
nsing  this  method  we  can  generate  a  big  class  of  optimal  linear  nneqnal 
error  protection  codes  (LUEP  codes)  very  easily  and  that  most  of 
the  constmctions  given  by  van  Gils  [1]  are  special  cases  of  this  new 
method.  A  big  advantage  of  oar  method  is,  that  aH  constracted  codes 
can  be  decoded  very  easily  by  the  well  known  Blokh-Zyablov-Zinov’ev 
algorithm  with  a  slightly  modified  metric. 

Summary 

Up  to  now  all  constmctions  of  generalized  concatenated  codes  (GC 
codes,  [2][3])  are  restricted  to  have  onter  codes  Ai  of  constant  length 
na  and  only  one  inner  code  B  together  with  its  partitions.  In  this 
paper  we  constract  binary  GC  codes  which  have  oater  codes  Ai  of 
different  lengthes  n^i  and  hence  different  binary  inner  codes  in 
the  different  colamns  of  the  code  matrix. 

Denote  by  Ai  :  {qi;nti,kai,dai)  the  onter  code  Ai  over  GF(qi) 
and  of  length  n^i,  dimension  k^i  and  minimam  distance  d^i  and  by 
^  ****  sabcode  of  the  jtb  binary  inner  code  with 

t  =  l,2,---,mandy  =  ■,na,max  =  max{naj.  By  concatenating 
onter  and  inner  codes  the  symbols  of  the  onter  code  Ai  is  need  to 
label  the  sabcode  and  its  cosets  obtained  by  partitioning  the  tth 
sabcode  Bj^K  The  new  GC  code  has  the  parameters: 

1=1 

dmin  >  ^mn  ^53  dp’ j  j 

where  Ji  C  {l.  -Oai}  with  \  Ji\  =  d.i. 

Table  1  shows  some  best  known  codes  constracted  in  this  way.  The 
inner  code  B^^^  and  its  sabcodes  for  j  =  1, 2,  ■  ■  ■ ,  riant  are  given  in  the 
first  colamn,  and  the  inner  codes  and  are  given  in 

the  and  colamns. 

The  same  idea  can  be  ased  to  constract  optimal  LUEP  codes.  In 
the  following  we  give  some  examples. 

Coiutruction  I:  First  we  consider  a  two-level  GC  code  (m  =  2), 
where  we  take  Ai  ;  (2*‘>;na,fc,i,s,)  aad  Aj  :  (2*“;n„fc,j,  j,)  as  oater 
codes  and  Bj  :  (ns,/tsi  -I-  ktj,dn)  and  its  subcode  Bj  :  (nt,kt],dn)  as 
inner  codes.  At  and  Aj  are  LUEP  codes  with  nonincreasing  separa¬ 
tion  vectors  g,  =  and  gj  =  (sji.sjj, •  • -.sj*.,).  If 

dtiSis.,  >  dsjsri,  then  theGC code i8abinary(nans,kaikM+fca)ks3,s) 
LUEP  code,  where  g  =  (dnsiil*„ , ,  •  •  • , dMSit.,lfc„ , dttSiilsu . 
•  •  •  denotes  the  fctj-vector  with  all  components 

equal  to  s).  It  can  be  seen  that  the  Construction  1,  SA  and  5  in  [1] 
we  special  cases  of  the  above  constmction,  where  some  special  codes 
arc  need  as  onter  and  inner  codes  to  obtain  optimal  LUEP  codes. 

Construction  II:  The  GC  code  with  At  :  (2<’**"*‘);  nB-t-n',fc,i,ii) 
aad  Aj  :  (2**;na,fcaZils)  as  oater  codes  and  B\  :  (nt,nt,l)  and 


Inner  codes 

Oater  codes 

GC  code 

(8,4,4) 

(8,1,8) 

(7,3,4) 

(4.1,4) 

(2®;  10,7Wls,8) 
(2;  8, 4, 4) 

(75,11,32) 

(8,4,4) 

(8,1,8) 

(7,3,4) 

(4,3,2) 

(23;  10, 3, 8) 

(2;  8, 4, 4) 

(75, 13,30) 

00 

(7,3,4) 

(3.1,3) 

(23;10,7Wts,8) 
(2;  8, 4, 4) 

(74,11,31) 

(8,4,4) 

(8,1,8) 

(4,3,2) 

(4,3,2) 

(23;  10, 3, 8) 

(2;  8,4,4) 

(72,13,28) 

(6,6,1) 

(6,5,2) 

(6,1,6) 

(5.5.1) 

(5.4.2) 

(2;  16, 1,16) 
(2M6,10,7) 
(2; 15,11,3) 

(95,52,14) 

Table  1:  Some  modified  GC  codes 


B}  :  (ns,  ks.dsi)  as  inner  codes  is  a  modified  GC  code,  where  the 
length  of  the  first  oater  code  Ai  is  larger  than  the  length  of  Aj.  In  this 
case  the  last  n'  symbols  of  a  codeword  in  Aj  are  not  concatenated  with 
other  codes  and  just  appended  to  the  first  rtaUt  concatenated  bits.  The 
LUEP  code  has  the  parameters  n=n^+(ni-ki)n' ,  k  =  {ni-kt)kai  -)• 
and  g  =  •  •  • , dsaSajl*,,  •  •  • 

where  sis,,  >  dsjszi .  Optimal  LUEP  codes  can  be  obtained  if  some 
special  codes  are  ased  as  outer  and  inner  codes.  The  codes  from  Con¬ 
structions  A,C,E,F,I,J  and  K  in  [1]  can  also  be  obtained  in  this  way. 

Construction  III:  The  GC  code  with  Ai  :  (2**;na,kaii&i)  and 
Aa  :  (2;n„kaa,gj)  as  onter  codes  and  Bi  :  (ns,fcs  -f  l.dg)  and  Bj  ; 
(ng,  1,  ng)  as  inner  codes  is  a  special  case  of  Constraction  /.  If  we  add 
to  the  onter  code  Aj  a  parity  bit,  which  is  not  concatenated  with  the 
inner  code  Bj,  for  S)g.,dg  >  sjjng  we  obtain  a  new  (n,ng  -t-  l.kail^s  + 
kaJ,(3iidglg„  -  •  •,sig..dglg^,(ng-l)gj-f-2rg,/2l))  LUEP  code  ,  where 
fgj/2]  denotes  \s-nli\  for  all  t  =  1,2, •  Here  if  we  take  some 

speciid  codes  as  outer  and  inner  codes,  the  same  codes  can  be  obtained 
as  from  the  Construction  3B,  Band  Bin  [1] 

A  big  advantage  of  the  new  constraction  method  is  that  the  codes 
can  be  decoded  very  easily  up  to  half  of  their  minimum  distance.  The 
decoding  algorithm  is  quite  similar  to  the  well-known  Blokh-Zyablov- 
Zinov’ev  algorithm  bat  with  a  slightly  modified  metric. 
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ABSTRACT 

This  paper  presents  the  analysis  of  concatenated  coding  systems 
with  inner  convolutional  codes  and  outer  burst-error-correcting  codes. 
Contrary  to  most  studies,  interleavers  are  not  required  to  perform  the 
analysis.  A  modelling  technique  for  the  outer  channel  resulting  from  a 
convolutional  decoder  operating  over  a  wide  variety  of  inner  channels  is 
presented.  The  proposed  outer  ntodel  is  simple  and  is  determined  for  inner 
charuiels  with  and  without  memory.  The  resulting  finite  state  outer  charmel 
model  is  entirely  characterized  by  the  transition  probabilities  between  the 
states  and  a  technique  to  compute  these  transition  probabilities  is  proposed. 
For  memoryless  charuiels,  hard  and  soft  decision  decoding  are  both 
considered.  The  well  known  Gilbert  and  Fritchman  models  are  used  to 
represent  channels  with  memory.  Once  the  outer  charuiel  model  is 
determined,  it  is  used  to  compute  the  bit  error  performance  of  the  entire 
concatenated  coding  system  considering  both  convolutional  and  block  outer 
codes,  the  Massey  k-diffuse  and  the  Reed-Solomon  codes  respectively. 
Several  examples  are  presented  to  illustrate  the  applicability  of  the  method. 
Results  are  obtained  both  by  analytical  methods  and  computer  simulation. 

SUMMARY 

Concatenated  coding  schemes  are  used  in  digital  communication 
systems.  They  have  the  ability  to  correct  long  error  sequences  and  provide 
very  high  reliability.  This  paper  presents  a  methodology  to  evaluate  the 
performance  of  concatenated  coding  systems  based  on  iruier  convolutional 
codes  and  outer  burst-error-correcting  codes.  A  typical  configuration  is  a 
short  constraint  length  inner  convolutional  code  followed  by  a  Reed- 
Solomon  (RS)  outer  code.  An  essential  aspect  of  our  work  is  that  we  can 
also  consider  the  use  of  an  outer  convolutional  code. 

In  this  work  particular  attention  is  paid  to  the  modelling  of  the  outer 
channel  formed  by  the  inner  encoder,  the  iruier  channel  and  the  inner 
decoder.  When  a  maximum  likelihood  decoder  like  the  Viterbi  decoder  is 
used,  the  outer  channel  exhibits  a  tendency  to  produce  bursts  and  should  be 
modelled  by  a  channel  with  memory.  Often  an  interleaver  is  used  to  remove 
the  memory  from  the  outer  channel,  its  presence  is  not  required  in  this  study 
but  can  be  accounted  for  if  needed.  Since  a  maximum  likelihood  decoder 
generates  error  events  of  various  lengths,  the  decoding  process  is  very 
difficult  to  analyze.  To  circumvent  this  difficulty,  we  have  modelled  the 
errors  at  the  output  of  a  sub-optimum  decoder  called  the  sliding  window 
decoder  (SWD).  This  decoder  operates  on  a  finite  window  size  L  and 
approaches  the  performance  of  the  Viterbi  decoder  as  L  increases  to  infinity. 
The  decoding  process  of  the  SWD  is  modelled  by  a  2-state  Markov  chain,  a 
correct  and  an  incorrect  state,  characterized  by  the  transition  probabilities 
Pci  and  Pic.  The  decoding  model  (e  g.  the  outer  model)  has  been  determined 
for  both  memoryless  channels  and  channels  with  memory.  The  memoryless 
channels  are  the  BSC  and  the  AWGN  channel  with  BPSK  modulated 
signals  and  soft  decision  decoding.  (Quantization  with  a  finite  number  of 
intervals  is  considered  and  arbitrary  metric  assignments  can  be  used. 
Results  for  a  (2,1,2)  code  and  integer  metric  assignments  are  illustrated  in 
Figure  I.  Gilbert's  (I)  and  Fritchman's  [2]  models  are  used  to  represent 
channels  with  memory. 

Then  the  bit  error  performance  of  the  concatenated  coding  system 
considering  both  convolutional  and  block  outer  codes  is  determined  from  the 
outer  model  mentioned  above.  The  outer  convolutional  code  considered  are 
the  X-diffuse  Massey  [3)  codes  which  are  entirely  defined  by  their  burst- 
error-correcting  capability  B  and  the  required  guard  space  G.  The  outer 
block  code  considered  are  the  well  known  RS  codes  which  can  correct  both 
multiple  bursts  and  random  errors.  These  codes  are  specified  by  their  block 


length  and  their  symbol-error-correcting  capability.  All  the  theoretical 
results  have  been  verified  against  simulations  results  and  show  good 
agreement.  Figure  2  shows  the  simulated  versus  analytical  performance 
results  for  both  random  error  and  burst  error  channels.  On  this  Figure  the 
detrimental  effect  of  memory  on  the  performance  of  the  concatenated  syston 
can  be  observed.  Results  also  iridicates  that  the  Reed-Solomon  codes 
outperform  the  Massey  Difiuse  codes  for  die  same  error-conecting 
capability.  Finally  we  find  that  soft  decision  decoding  on  the  inner  code  with 
a  finite  number  of  levels  (  Q  =  4  and  8)  can  provide  most  of  the  gain  in 
performance  for  the  concatenated  system  that  would  be  obtained  using 
infinitely  fine  quantization. 
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Figure  1  Performance  of  a  (2,1,2)  code:  BPSK  modulation  and  soft 
decision  decoding. 


■Ella 

Figure  2  Performance  of  a  Concatenated  Coding  system:  BSC  compared  to 
a  Gilbert  model. 
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Abstract 

Hybtid-ARQ  protocols  are  known  to  improve  the  telisbility 
of  coininunication  systems  at  the  expense  of  the  total  through¬ 
put.  Systems  based  on  trellis  coded  modulation  (TCM)  can  be 
modified  for  use  in  type-I  hybrid-AHQ  schemes.  In  order  to  re¬ 
gain  some  of  the  lost  throughput  due  to  retransmission  requests, 
various  adaptive  code  rate  algorithms  can  be  applied.  In  this 
paper  the  concept  of  averaged  diversity  cond>ining  is  considered. 
Multiple  received  copies  of  the  same  data  packet  are  combined 
on  a  symbol-by-symbol  basis  to  improve  the  performance.  A 
simple  example  of  an  adaptive  rate,  TCM  hybrid-ARQ  proto¬ 
col  is  developed  and  performance  bounds  on  both  bit  error  rate 
and  throughput  are  derived  for  AWGN  channels.  To  verify  the 
bounds  simulation  results  have  been  obtained  and  are  presented. 


1  Summary 

In  this  paper  we  examine  the  performance  of  a  trellis  coded  hybrid- 
ARQ  protoctd  (TCM-HARQ)  with  packet  combining.  The  Yamamoto 
and  Itoh  algorithm  [1]  is  used  to  generate  retransmission  requests  as 
investigated  by  Wicker  and  Rasmussen  [2].  The  derivation  of  the 
throughput  bounds  for  the  combining  protocol  follows  the  approach 
of  Kallel  and  Haccoun  [3].  The  expected  number  of  transmissions  is 
bounded  as  follows; 

1  +  £  •  n  Pi  w)  <  Tr  <  1  -1-  f;  />.  (ft)  (1) 

i=I  \  jal  /  i=l 


Here,  P(Rl)  the  probability  of  a  retransmission  after  L  packets  have 
been  combined.  The  lower  index  on  P  indicates  whether  a  lower  or 
an  upper  bound  is  to  be  used.  The  averaged  diversity  combining  of  L 
packets  is  equivalent  to  a  decrease  of  the  effective  noise  variance  by  a 
factor  1/i.  Introdudng  the  noise  improvement  factor  1/L,  the  bounds 
on  both  throughput  and  bit  error  rate  (BER)  developed  by  Wicker  and 
Rasmussen  [2]  can  then  be  applied.  For  the  BER  the  upper  bound  is 
as  follows; 


Pb  <  Pu(Bi )  -  £  ^  (P,{Ri ))•  I n  -  /"vCft+i ) 


(2) 


Here,  P{Bl)  is  the  probability  of  bit  error  after  L  packets  have  been 
combined.  The  lower  bound  is  derived  the  same  way. 


PB>Pl(B^)■ 


(i-£(  /’.(ft)-(fl(Ai))’|nft(«/)| )) 


(3) 


crease  in  the  BER  is  observed.  This  phenomenon  is  easily  explained, 
for  the  combination  of  packets  is  equivalent  to  improving  the  SNR. 
At  some  point  a  majority  of  accepted  packets  will  consist  of  combined 
packets  and  thus  the  effective  SNR  will  be  improved.  In  Figure  2  a 
substantial  improvement  in  throughput  is  observed.  At  SNR’s  where 
no  throughput  is  expected  for  the  non-combining  case  it  is  now  possi¬ 
ble  to  get  close  to  50  %  of  full  throughput  through  packet  combining. 
The  simulated  throughput  follows  the  lower  bound  very  closely.  This 
bound  is  thus  a  good  appraorimation  to  the  real  throughput. 

E,/No  (dB) 


Figure  1:  BER  performance  of  a  TCM-HARQ  system  based  on  a 
(2,1,1)  convolutional  encoder  and  QPSK  modulation. 


Figure  2;  Throughput  performance  of  a  TCM-HARQ  system  based  on 
a  (2,1,1)  convolutional  encoder  and  QPSK  modulation. 


For  low  signal-to-noise  ratios  (SNR)  the  lower  bound  tends  to  break 
down.  An  approximation  is  thus  more  useful. 

Pb  fl(ft)-f;(ft(ft)‘+’{ft(ft+i)-ft(ft+j)] 

■sO 

•(fl(fli)-ft.(Bi+r)])  (4) 

A  simple  2-state,  4-PSK  TCM-HARQ  protocol  has  been  investi¬ 
gated  in  detail  and  simulation  data  obtained.  The  results  are  shown 
in  Figures  1  and  2.  For  the  BER  in  Figure  1  the  approximation  is 
noted  to  be  very  good.  The  awkward  behavior  of  the  BER  should  be 
noted  here  as  more  and  more  packets  are  combined.  Under  conditions 
in  which  all  packets  are  combined  with  at  least  one  other  packet,  a  de- 
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Coded  ARQ,  or  hybrid  ARQ,  is  rm  effective  scheme  in  order 
to  attain  high  reliability  and  high  throughput  for  channels  with 
a  moderate  noise  intensity.  In  1980,  Yamamoto  and  Itoh  pro¬ 
posed  a  coded  ARQ  based  on  a  convolutional  code  and  the  Viterbi 
decoding  and  show  that  a  good  bit  error  probability  is  attained. 
As  the  channel  noise  increases,  however,  its  performance  rapidly 
deteriorates  because  of  increased  retransmission  requests  and  de¬ 
coding  errors.  In  1990  ISIT,  the  author  proposed  a  new  coded 
ARQ  scheme  which  exploits,  for  error  detection,  the  error  prop¬ 
agation  caused  by  reduced-complexity  decoding  of  convolutional 
codes  with  an  extremely  large  constraint  length  and  showed,  by  a 
random  coding  argument,  the  attainability  of  both  high  reliabil¬ 
ity  and  high  throughput.  This  asymptotic  result,  however,  does 
not  assure  that  a  good  performance  is  practically  attained;  more 
realistic  discussions  have  been  expected. 

In  this  paper,  we  presents  a  simple  scheme  for  constructing  the 
desired  code,  say  the  ARQ  code  hereafter,  from  a  convolutional 
code  with  a  short  constraint  length  and  from  a  BCH  block  code. 
In  this  construction,  the  convolutional  code  takes  the  role  of  error 
correction  and  the  block  code  takes  the  role  of  propagating  a  de¬ 
coding  error  to  the  next  ARQ  block.  This  ARQ  code  possesses  a 
particular  unit-memory  structure  between  ARQ  blocks  besides  its 
original  trellis  structure. 

Recently,  we  found  that  Kudryashov  also  discussed  a  coded 
ARQ  scheme  using  such  a  unit-memory  structure  between  ARQ 


blocks  and  considered  its  performance  by  a  random  coding  argu¬ 
ment.  Although  his  coding  scheme  is  basically  block  coding  and  no 
particular  code  construction  scheme  is  suggested,  both  works  show 
an  intimate  relationship.  Stimulated  by  his  work,  next,  we  show 
that  the  performance  bound  presented  before  can  be  considerably 
strengthened. 

We  show,  using  the  above  mentioned  ARQ  code  and  the  gener¬ 
alised  Viterbi  decoding  algorithm,  that  a  good  performance  is  ob¬ 
tained  for  a  considerably  large  channel  noise.  Especially  interesting 
is  that  high  reliability  is  attained  near  Ramp,  the  computational 
cut-off  rate  of  the  channel.  This  is  not  expected  for  coded  ARQ 
schemes  based  on  sequential  decoding  algorithms.  In  the  simula¬ 
tion,  at  least  several  thousands  ARQ  blocks  of  block  length  about 
500  bits  are  transmitted  and  “high  reliability”  means  that  no  in¬ 
correct  acceptance  of  erroneously  decoded  ARQ  block  is  observed. 
For  a  larger  channel  noise,  however,  the  throughput  becomes  sen¬ 
sitive  to  the  rule  which  decides  whether  a  particular  ARQ  block  is 
decoded  incorrectly.  Our  original  scheme,  as  well  as  Kudryashov’s 
scheme,  uses  the  threshold  decision  log  >  T  for  error  detec¬ 

tion.  We  also  consider  the  use  of  error- detecting  codes  for  error 
decision  and  compare  the  performance  through  a  theoretical  anal¬ 
ysis  and  simulation.  It  is  shown  that  the  use  of  error-detecting 
code  increases  the  robiutness  of  the  scheme  and  allow  us  to  attain 
high  reliability  at  rates  above  Ramp- 
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Abstract  —  We  consider  the  use  of  rate-compatible  variable-rate 
punctured  codes  for  implementing  an  adaptive  transmission  scheme 
for  MB  communication.  The  performance  of  the  scheme  is  investi¬ 
gated  both  theoretically  and  through  computer  simulation.  The  re¬ 
sults  indicate  that  a  rate-adaptive  strategy  leads  to  a  more  efficient 
use  of  available  received  power  than  fixed  coding  rate  strategics. 

I  Introduction 

Meteor-burst  (MB)  communication  is  an  attractive  means  of  be¬ 
yond  line  of  sight  radio  communication  for  several  applications 

[1].  In  MB  communication,  propagation  is  achieved  through  re¬ 
flection  of  transmitted  signals  from  trails  of  ionized  particles  cre¬ 
ated  by  meteors  entering  the  atmosphere.  MB  channels  are  char¬ 
acterized  by  random  availability  and  received  signal  levels  that 
decay  with  time.  For  the  most  prevalent  type  of  trails,  the  signal- 
to-noise  ratio  (SNR)  is  modeled  as  exponentially  decaying,  i.e., 
SNR(t)  =  SNRoe”'^’'.  The  initial  signal-to-noise  ratio  SNRo  and 
the  decay  parameter  t  are  random  variables  which  differ  from 
trail  to  trail  [2]. 

Error  control  coding  provides  powerful  means  for  dealing  with 
uneven  received  power  [3].  With  a  fixed  coding  rate  scheme  how¬ 
ever,  power  is  wasted  at  the  beginning  of  trails,  while  at  the  end 
of  trails  noise  conditions  are  severe.  We  consider  a  solution  that 
relies  on  the  use  of  variable-rate  punctured  convolutional  codes 
for  implementing  an  adaptive  transmission  scheme  for  MB  com¬ 
munication. 

II  Adaptive  transmission  scheme 

Error  performance  may  be  considered  satisfactory  as  long  as  the 
residual  Bit  Error  Rate  (BER)  is  kept  below  an  acceptable  level. 
The  efficiency  of  the  transmission  scheme  is  then  measured  by 
the  throughput  achieved.  Our  scheme  relies  on  the  use  of  rate- 
compatible  variable-rate  punctured  codes  [4|  for  adapting  the  er¬ 
ror  correcting  power  to  the  variations  of  received  power.  Punc¬ 
tured  codes  with  coding  rates  close  to  one  are  used  for  the  first 
transmitted  bits  of  a  trail  which  require  almost  no  error  protec¬ 
tion  and,  in  parallel  with  the  decay  of  received  signal  power,  more 
and  more  redundancy  is  added  to  the  tran.smi.ssion  by  progres¬ 
sively  decreasing  the  coding  rate. 

III  Performance  of  the  scheme 

The  performance  of  the  scheme  has  been  investigated  both  theo¬ 
retically  and  through  computer  simulation.  The  theoretical  per¬ 
formance  is  obtained  by  modeling  the  time- varying  MB  channel 
as  a  rapid  succession  of  stationary  channels  with  decreasing  val¬ 
ues  of  average  S.N'R.  Error  performance  is  obtained  iising  classical 
union  bound  arguments,  assuming  ad<iitive  white  Gaussian  noise. 
Several  combinations  of  mochilation  types  and  quantization  have 
been  considered.  Computer  .simulations  were  conducted  using  a 
similar  model. 

Both  theoretical  and  simulation  results  indicate  that  the  rate- 
adaptive  strategy  leads  to  a  more  efficient  use  of  available  received 
power  than  fixed  coding  rate  strategies.  As  Fig.  1  shows,  a 

'This  work  was  supported  in  part  by  the  Natural  Sciences  and  Engineer¬ 
ing  Research  Council  of  Canada  and  by  the  Fonds  de  I'tjQAM. 


plateau  is  observed  on  BER  curves.  This  plateau  corresponds  to 
a  range  where  received  power  in  excess  of  what  is  required  for  the 
target  BER  is  exchanged  for  additionnal  throughput.  Indeed,  the 
average  bit  rate  is  seen  to  increase  with  increasing  initial  SNR  in 
this  range  (Fig.  2). 

The  rate-eulaptive  strategy  clearly  outperforms  fixed  coding 
rate  strategies  with  or  without  interleaving,  achieving  better  over¬ 
all  throughput  (Fig.  2).  Some  trails  that  do  not  provide  sufficient 
SNR  for  sustaining  transmission  with  the  fixed  rate  strategies 
may  be  exploited  with  the  adaptive  scheme,  which  has  the  further 
advantage  of  providing  continuous  improvements  in  throughput. 
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Figure  1:  BER  versus  initial  symbol  energy-to-noise  ratio.  ,W  = 
3,  r  =  2.  non-coherent  FSK,  hard  quantization. 


Figure  2:  Throughput  versus  initial  symbol  energy  to-noise  ratio. 
.M  =  3,  r  =  2.  non  coherent  E.SK.  hard  quantization. 
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SUMMARY 

Many  future  communicatkm  systems  eoqtloy  a  latpe  numbCT  of 
channek  tightly  packed  in  separate  ficquency  bai^  wh^  need  to  be 
demultiplexed  at  a  central  site  to  achieve  network  connectivity. 
Mobile  cellular  and  very  small  aperture  satellite  (VS AT) 
communicatioa  systems  are  two  inqxxtant  exanq^  where  numerous 
diannels  are  carn^  in  a  ^xet^  segment  tbou^  a  central  processing 
location.  The  practical  feasibility  of  such  systems  rests  on  efficiendy 
sharing  common  processing  resources  for  simultaneously 
demultiplexing  chaimels.  This  increased  efticieitcy  introduces  a 
heightened  susceptibility  to  both  permanent  and  temporary  Mures  in 
the  underlying  demultiplexing  digital  hardware.  This  pt^rer 
demonstrates  how  real  convolutional  codes  can  be  used  for  denning 
embedded  parity  streams  which  delect  processing  Mures. 

Multirate  digital  signal  processing  techniques  capitalize  on  the 
narrowband  nature  of  individual  output  chaimels  permitting  lower 
sample  rates  for  most  internal  processing  qierations.  The  wide  input 
spectrum  requires  a  high  sample  rate  whereas  individual  infmiiumon 
chaimels  need  much  lower  sarnie  rates.  An  efficieiu  realization  of  the 
multichannel  demultiplexer  s^arates  chaimels  by  spectrally  shifting  a 
prototype  filtering  operation.  However,  it  may  te  segmented  into 
many  short  filter  sections,  each  operating  at  a  lower  sampling  rate, 
whose  outputs  are  passed  through  an  appropiiateljr-sized  fast  Fourier 
transform  (FFT).  The  FFT  realization  is  a  major  reason  for  the 
increased  efftciermy.  Furthermore,  the  input  A/D  converters  may  be 
used  in  a  routing  fashion  to  achieve  the  very  high  speed  sampling 
necessary  at  the  input  Clontiguous  samples  are  relegated  to  their 
respective  positions  in  the  Iowa-  speed  processing  sections  for  the 
remaining  necessary  operations.  There  are  a  wide  range  of  etiicient 
configurations  of  banks  of  subfllters  for  most  practical  applications. 
Figure  1  dramatizes  the  sharing  nature  of  a  multirate  ^ter  bank 
demultiplexer. 

Any  failure,  whether  permanent  or  temporary,  virtually  anywhere 
in  an  efficient  multirate,  multichannel  realization  impacts  many 
channels  simultaneously.  One  effective  protection  approach  employs 
an  algorithm  based  fault  tolerance  rruthodology,  wherein  pr^lel 
parity  generating  channels  simultaneously  pr^uce  a  few  parity 
samples  for  comparison  with  related  parity  values  generated  fiom  the 
system  outputs.  The  parallel  parity  producing  resources  truy  be  used 
as  standby  units  for  rqrlacing  failed  resources.  Of  course,  then  the 
pro^tion  levels  are  reduced  accordingly.  Real  convolutional  codes, 
derived  from  burst-correcting  binary  convolutional  codes,  are  ideally 
suited  for  determining  the  parity  values  employed  in  the  algorithm 
based  fault  tolenmce  approach.  These  codes  are  used  in  a  detection 
mode  only,  and  system  diagnostic  and  reconfiguration  phases  truiy 
follow  the  detection  of  failures. 

Rate  K/(K-«-l)  systematic  binary  burst-correcting  convolutional 
codes  p^uce  one  parity  sample  for  every  K  information  positions 
while  still  detecting  the  onset  of  a  burst  within  a  constraint  len^. 
When  these  codes  are  viewed  over  the  real  numbers  and  are  judgoT by 
a  real  Hamming  distance  metric,  excluding  roundoff  errors,  it  is 
known  that  their  teal  error-detecting  levels  are  at  least  as  good  as  the 
binary  precursor  codes.  The  parity  positions  in  these  real  number 
convolutional  codes  dicute  a  puity  generating,  finite  impulse 
response  filter  with  Z  transfer  fuiiction  Q(Z)  conuuting  only  0  and  I 
weights.  The  parity  filter  operates  at  a  rate  reduced  by  factor  K  and 
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has  a  memory  qian  determined  by  the  constraint  length  of  die  original 
binary  code. 

The  pariqr  filter  is  used  to  produce  low  rate  parity  values  from 
each  demultiplexer  channel.  Cta  the  other  hand,  compataMe  parity 
vidues  are  generated  in  parallel  with  the  demultiplexer  from  the 
original  hi^-speed  input  sanqiles,  by  comUning  the  doiuiltiplexing 
prototype  filter  H(Z)  with  the  parity  transfer  function  The 
required  parity  subsystem  is  very  similar  to  the  main  demultiplexer, 
however,  in  the  para^l  parity  iwocess,  the  coirqiutational  rates  in  the 
segmented  filter  sections  and  the  FFT  are  reduoBd  further  by  factor 
K,  a  design  parameter  of  the  real  code.  The  error-detticting  fault 
tolerance  capability  for  a  generic  channel  r  is  (tepicted  in 
Figure  2,  where  the  down  sampling  feature  in  parity  filter  Q(Z)  is 
indicated  by  4>K.  The  coiiqiaratm  contains  a  duesh^  tiderance  to 
accommodate  roundoff  noise  introduced  by  the  different 
computational  paths  for  related  parity  values.  The  use  of  rotating  A^ 
converters,  necessary  for  high-spe^  performance,  leads  to  easily 
detectable  error  ener^  in  well-defined  spare  channels  when  an  A/D 
converter  fails. 
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Abstract 

We  propose  a  new  construction  of  trellis-coded  vector  quantiz¬ 
ers  (TCVQs),  based  upon  our  construction  of  trellis-coded  quan¬ 
tizers  (TCQs).  The  new  construction  yields  TCVQs  with  a  higher 
performance  and  is  simpler  than  the  previous  construction  given 
in  [1]. 

The  performances  of  the  new  TCVQs  have  been  determined  for 
the  memoryless  Gaussian,  Laplacian,  and  uniform  sources.  The 
experiments  that  have  been  performed  for  various  computational 
complexities  and  vector  dimensions  show  that,  at  a  constant  rate 
and  complexity,  the  performances  of  the  TCVQs  decrease  as  the 
dimension  increases. 

Thus,  for  memoryless  sources,  at  the  same  rate  and  computa¬ 
tional  complexity,  our  TCQs  are  superior  to  our  TCVQs. 

Summary 

A  first  constructive  design  method  for  trellis-coded  quantiz¬ 
ers  (TCQs)  and  its  extension  to  trellis-coded  vector  quantiz¬ 
ers  (TCV'Qs)  have  been  given  in  [i,  2J.  Recently,  we  proposed 
a  new  construction  of  TCQs  for  the  rate  of  1  b/sample  [3,  4]  as 
well  as  its  extension  to  the  higher  rates  [5];  our  TCQs  outperform 
thos^’pf  [2].  Here,  we  present  the  extension  of  our  construction  to 
TCV^s. 

The  new  construction  yields  TCVQs  with  a  higher  performance 
and  is  simpler  than  that  of  [1],  since  it  does  not  make  use  of  convo¬ 
lutional  codes  and  uses  trellises  corresponding  to  a  shift  register. 
In  [1],  only  two  experiments  were  presented;  these  showed  that  the 
TCVQs  outperformed  the  TCQs  of  [2].  We,  however,  performed 
various  experiments  which  show  that  our  TCQs  are  superior  to 
our  TCVQs,  at  the  same  computational  complexity. 

The  TCVQs  we  consider  have  2"  states,  with  two  branches  en¬ 
tering  and  leaving  each  state.  Each  branch  is  assigned  a  set  of 
representation  symbols,  according  to  the  structure  defined  in  [5). 
Of  course,  in  this  case  the  representation  symbols  are  not  scalars, 
but  vectors.  Specifically,  for  quantizing  at  R  b/sample  using  N- 
dimensional  representation  vectors  (.’V  =  1,2,...),  each  set  con¬ 
tains  2^^"'  vectors. 

To  compare  the  performances  of  the  TCVQs  with  those  of  our 
TCQs,  experiments  have  been  performed  for  memoryless  Gaus¬ 
sian,  Laplacian,  and  uniform  sources.  For  the  experiments,  as 
in  [1,  2.  3.  4,  ,5],  a  training  set  of  N  ■  100  000  independent  random 
samples  was  used.  To  optimize  the  codebook,  100  iterations  were 
performed  using  an  algorithm  based  on  that  described  in  [6j,  but 
extended  for  TCVQ  and  adapted  to  maintain  the  structure  de¬ 
fined  in  [.5).  Representation  symbols  onto  which  no  input  symbols 
are  mapped  are  updated  to  zero  (the  average  input  value).  The 
initial  codebook  was  constructed  from  independent  random  sam¬ 
ples  from  the  distribution  to  be  coded  (maintaining  of  course  the 


Complex. 

TCQ:MF 

TCVQ:FMW 

TCVQ;VW 

TCQ:VW 

32 

4.92 

5.05 

5.15 

5.16 

64 

5.13 

5.22 

5.34 

5.39 

Table  1:  SNRs  (in  dB),  at  the  same  complexity,  for  the  TCQs 
of  [2]  (TCQ:MF),  the  TCVQs  of  [1]  (TCVQ;FMW),  the  new 
TCVQs  (TCVQtVW),  and  the  TCQs  of  [.5]  (TCQ:V\V)  for  the 
Laplacian  source,  at  R  =  1. 

structure  defined  in  [5]). 

As  a  measure  of  computational  complexity  we  use  the  number  of 
evaluations  of  the  (single-sample)  distortion  function  necessary  to 
quantize  one  sample.  Thus  the  complexity  equals  the  product  of 
the  number  of  states,  the  number  of  branches  (sets)  per  state,  and 
the  number  of  vectors  per  set;  2‘'-2’2'”*'  =  2“'^^^.  Table  1  shows 
a  comparison,  at  the  same  complexities,  of  the  performances  of  the 
TCQs  of  [2],  the  TCV'Qs  of  (1),  the  new  TCV'Qs.  and  our  TCQs  [5] 
for  quantizing  the  Laplacian  source  at  1  b/sample.  Even  though 
our  TCVQs  outperform  those  of  [1],  our  TCQs  are  still  superior. 

In  [1],  only  the  two  experiments  shown  in  Table  1  were  pre¬ 
sented.  We  performed  various  experiments  for  dimensions  .V  equal 
to  2,  4.  and  8,  and  complexities  up  to  4096.  They  show  that,  at  a 
constant  complexity,  the  performances  of  the  TCVQs  decrea.se  as 
the  dimension  increases. 

Thus,  for  memoryless  sources,  at  the  same  rate  and  computa¬ 
tional  complexity,  our  TCQs  are  superior  to  our  TCVQs. 
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An  Efficient  Algorithm  for  Optimal  Tree 
Pruning  with  Application  to  VQ 

Xiaolin  Wu  *  Yonggang  Fang  ^ 

A  vector  space  can  be  recursively  partitioned  into  k  convex  regions 
with  A;  -  1  cutting  halfplanes.  Such  a  A;-partition  can  be  embedded  into 
a  binary  tree  of  k  leaves.  This  tree  data  structure  was  independently 
developed  by  researchers  in  VQ  [6],  pattern  classification  [1],  databases 

[3]  and  computer  graphics  [4].  It  was  given  different  names  in  different 
research  communities:  quantizer  tree,  classification  tree,  k-d  tree,  and 
binary  spatial  partitioning  (BSP)  tree.  In  this  abstract,  we  will  use  the 
term  BSP  tree.  Each  leaf  of  the  BSP  tree  corresponds  to  a  resulting 
convex  region  of  the  fc-partition,  and  each  internal  node  of  the  tree 
and  its  two  sons  correspond  to  a  bipartition  by  a  cutting  halfplane. 
The  BSP  tree  has  a  wide  range  of  applications:  source  coding,  pattern 
recognition,  artificial  intelligence,  computer  graphics,  etc.  However, 
for  clarity  and  relevancy  to  the  information  theory  symposium,  we  will 
study  the  problem  in  the  framework  of  VQ.  The  results  apply  to  other 
appUcations  straightforwardly. 

A  tree-structured  vector  quantizer  (TSVQ)  has  two  attractive  ad¬ 
vantages:  low  design  complexity  and  low  decoding  complexity  com¬ 
pared  with  its  unstructured  counterparts.  However,  these  computa¬ 
tional  advantages  are  gained  at  the  expense  of  codebook  optimality. 
Namely,  the  k-partition  embedded  into  the  BSP  tree  is  not,  in  general, 
a  Voronoi  diagram  on  the  k  centroids.  Two  avenues  were  opened  to 
improve  the  performance  of  TSVQ:  1)  adaptive  partitioning  strategy 
and  2)  optimal  tree  pruning.  When  growing  the  spatial  partitioning 
tree  we  can  optimize  the  cutting  halfplanes  one  at  a  time  by  principal 
analysis  [9],  or  optimize  several  cutting  halfplanes  together  by  dynamic 
programming  as  proposed  by  Wu  [10],  and/or  elaborate  on  the  order 
of  tree  growth  using  a  look-ahead  scheme  as  suggested  by  Riskin  and 
Gray  [7],  and  Wu  and  Zhang  [9],  rather  than  blind  recursion.  To  fur¬ 
ther  improve  the  performance  of  TSVQ,  one  can  generate  an  initial 
BSP  tree  of  larger  size  and  then  optimally  prun  it  back  to  the  required 
size.  The  first  work  on  pruned  TSVQ  was  due  to  Chou  et  al.  [2],  which 
was  developed  from  an  earlier  work  by  Breiman  et  al.  [1]  on  classifi¬ 
cation  trees.  The  pruning  of  a  BSP  tree  is  an  optimization  problem  of 
minimizing  some  objective  function  (say,  quantization  distortion)  un¬ 
der  certain  constraints  (e.g.,  the  size  or  the  expected  height  of  the  tree, 
the  entropy  of  the  leaves,  etc.).  The  algorithm  of  [2]  can  find  points  on 
the  convex  hull  of  the  objective  function.  However,  given  an  arbitrary 
constraint  value,  this  algorithm  can  only  obtain  the  minimum  through 
time  sharing.  Recently,  Lin  et  al.  [8]  showed  that  optimally  pruning  an 
n-node  initial  tree  to  a  k-node  tree  to  minimize  the  total  quantization 
distortion  can  be  effected  in  0(nk)  time  without  time  sharing. 

In  this  research  we  found  that  the  expected  execution  time  of  Lin  et 
al.'s  algorithm  can  be  reduced  from  O(n’)  to  0(n  log  n)  if  0(n)  =  0(k) 
(a  common  case  in  practice).  We  observed  that  the  mechanic  bottom- 
up  testing  in  the  search  for  the  optimal  subtree  as  conducted  by  the 
algorithm  of  [8]  was  often  unnecessary  and  wasteful.  Decisions  can  be 
made  at  a  much  earlier  stage  as  to  which  subtrees  can  never  be  part 
of  the  final  optimal  k-node  tree  and  which  top  portion  of  the  initial 
BSP  tree  must  stay  in  the  final  optimal  k-node  tree.  Consequently,  the 
search  domain,  and  hence  the  algorithm  execution  time  is  drastically 
reduced. 

Let  E(v)  be  the  quantization  distortion  of  the  quantizer  cell  cor¬ 
responding  to  a  BSP  tree  node  v.  Then  for  each  internal  node  ®, 
we  define  its  partitioning  profit  to  be  A(u)  =  E(v)  -  [E(v.le/tson)  -P 
Ejv.rip/ifson)].  A(v)  quantifies  how  much  reduction  the  corresponding 
bipartition  of  v  brings  to  the  total  distortion.  We  sort  all  the  internal 
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nodes  of  an  initial  BSP  tree  T  using  A(v)  as  the  key  in  descending 
order,  and  denote  by  X(v)  the  rank  of  v  in  this  sorted  list.  Based  on 
the  ranking  of  the  partitioning  profits,  we  introduce  a  special  kind  of 
pruned  trees  of  T,  called  stable  roofs.  A  stable  roof  is  defined  to  be  a 
subtree  A  of  T  rooted  at  the  root  of  T  such  that  all  the  [|  A|/2j  internal 
nodes  of  R  have  the  [|A|/2j  largest  partitioning  profits  among  the  in¬ 
ternal  nodes  of  T,  namely,  \{v)  <  [|A1/2J  if  u  is  an  internal  node  of  A, 
where  |A|  is  the  total  number  of  nodes  in  A.  By  the  above  definition, 
the  initial  BSP  tree  T  is  itself  a  stable  roof.  In  general,  there  may  exist 
many  stable  roofs  A,  C  T.  The  subscript  i  in  Ri  denotes  the  size  of  the 
stable  roof  A,,  i.e.,  t  =  |A,j.  For  simplicity  we  assume  that  ail  A(v) 
are  distinct  so  that  no  two  stable  roofs  have  the  same  size. 

Now  investigate  our  problem  of  constructing  the  optimal  pruned 
fc-node  tree  Topt(k)  from  the  initial  BSP  tree  T,  fc  <  n  =  IT],  If  there 
exists  a  stable  roof  A*,  then  A*  =  Tapt{k),  because  any  other  internal 
nodes  of  T  that  are  not  the  internal  nodes  of  Aj,  have  smaller  profits, 
hence  they  cannot  reduce  the  total  quantization  distortion  further  by 
replacing  any  of  the  internal  nodes  of  As.  Even  if  As  does  not  exist, 
we  may  find  two  stable  roofs  A;  and  Rj  such  that  i  <  k  <  j.  It  is 
easy  to  see,  in  this  case,  that  A,  C  Topt{k)  (in  fact,  Ai  is  also  a  stable 
roof  of  Topi(k)),  and  that  v  ^  Rj  implies  t;  ^  Tapt{k).  Therefore,  we 
only  need  to  prun  the  nodes  between  A,  and  Rj,  i.e.,  those  v  such  that 
V  €  Rj  but  V  ^  Ri,  in  search  for  Toftik).  The  worst  case  is  when  T  is 
the  only  stable  roof  of  T.  We  have  no  bounds  on  the  shape  of  Topi(k). 
Fortunately,  this  worst-case  does  not  occur  often  in  practice. 

The  stable  roofs  Ri  and  A^  such  that  i  <  k  <  j  serve  as  the  upper 
and  lower  bounds  on  the  shape  of  Toj,t(k).  If  the  gap  between  A,  and 
Rj  has  m  nodes,  then  the  optimal  pruning  can  be  done  in  0(m*)  time 
using  the  dynamic  programming  paradigm.  In  our  experiments,  m  was 
very  small  independent  of  k,  so  the  cost  of  O(m^)  can  be  considered 
negligible.  The  family  of  stable  roofs  with  respect  to  T  can  be  found  by 
the  standard  graph  connected  component  algorithm.  Starting  with  the 
initial  graph  consisting  of  the  root  of  T  and  its  two  edges,  we  insert  into 
the  graph  the  internal  nodes  u  with  their  edges  in  the  descending  order 
of  A(n).  In  the  insertion  process,  we  dynamically  detect  the  connected 
components  containing  the  root  of  T  and  examine  if  they  are  stable 
roofs.  Computing  A,  and  Rj  requires  O(nlogn)  time,  which  is  spent 
on  sorting  A(t>)  and  detecting  stable  roofs.  Thus  our  claim  on  the  time 
complexity  of  computing  T„yt(k). 
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ABSTRACT 

Two  results  are  given.  First,  using  a  result  of  Csiszar,  the 
asymptotic  (i.e.,  high  resolution/low  distortion)  performance  for 
entropy  constrained  tessellating  vector  quantization,  heuristically 
derived  by  Gersho,  is  proven  for  all  sources  with  finite  differen¬ 
tial  entropy.  This  implies,  using  Gersho’s  Conjecture  and  Zador’s 
formula,  that  tessellating  vector  quantizers  are  asymptotically  op¬ 
timal  for  this  broad  class  of  sources,  and  generalizes  a  rigorous 
result  of  Gish  and  Pierce  from  the  scalar  to  vector  case.  Second, 
the  asymptotic  performance  is  established  for  Zamir  and  Feder’s 
randomized  lattice  quantization.  With  the  only  zissumption  that 
the  source  has  finite  differential  entropy,  it  is  proven  that  the  low 
distortion  performance  of  the  Zamir- Feder  universal  vector  quan¬ 
tizer  is  asymptotically  the  same  as  that  of  the  deterministic  lattice 
quantizer. 

SUMMARY 

Let  Q%  denote  an  ,V-level  fc-dimensional  vector  quantizer,  and 
let  .V*‘  be  the  Jt-dimensional  random  vector  with  density  /  and  dif¬ 
ferential  entropy  A(/)  to  be  quantized.  Let  the  rth  power  quanti¬ 
zation  distortion  be  defined  in  the  usual  way, 

Dr(Q%(X‘’))  =  |£||.V*  -Q?,(X*)ir. 

where  H  •  ||  denotes  the  Euclidian  norm,  and  r  >  0.  Denote  the 
Shannon  entropy  of  Q%  by  and  for  //  >  0  let 

DAH.k.r)  =  m(  inf  Dr(g^(.V*)),  (1) 

the  distortion  of  an  optimal  ^-dimetisional  vector  qitantizer  with 
entropy  //.  Gersho  [2]  heuristicly  derived  the  asymptotic  perfor- 
tiiatice  of  i[uatitizers  given  by  the  tessellation  of  'R*  by  a  convex 
polytope  P.  He  found  that  if  Q^t.p  denotes  the  tesselating  quan¬ 
tizer  with  rth  power  distortion  d,  then 

iim  (2) 

a— 0 

where  /(P)  is  the  normalized  rth  moment  of  P.  We  prove  (2) 
in  great  generality.  Our  Theorem  1  establishes  the  asymptotic 
entropy  constrained  performance  of  lattice  quantizers  tvithout  any 
stnoothness  or  compact  support  conditiott  on  the  density.  Thus  the 
often  (pioted  formula 

lim  [//(Q,x)  +  ilog  12D,(Q,i)]  =  h(f),  (3) 

A  —0  J 

on  the  asymptotics  of  uniform  quantizers  is  proved  for  all  densities 
such  that  has  finite  Shannon  entropy  for  some  step  size  A,  and 
h{f)  <  -Xj,  strengthening  tlish  and  Pierce's  result. 

Ziv  [4]  introduced  a  randomized  ("dithered”)  quantization 
.scheme  for  scalar  random  variables  which  was  extended  by  Za¬ 
mir  and  Feder  [3]  for  lattice  vector  quantizers. 

We  prove  in  Theorem  2  that  for  a  large  class  of  densities  the 
asymptotic  performance  of  the  randomized  lattice  (piantizer  ami 
the  asymptotic  performance  of  the  ordinary  lattice  quantizer  are 
the  same. 


Theorem  1  If\h{f)\  <  oo  and  H{Qj  p(X'‘))  <  oc  for  some  d  > 
0,  then 

lim  =  l(P)2i'‘^^K  (4) 

Furthermore,  if  Zador's  formula  holds  for  f,  l(P)  =  C(k,  r),  and 
Gersho 's  conjecture  holds,  then 

DAHlQ'^p),k,r)  , 

hm  - 7 -  =  1,  (5) 

d—o  d 

i.e.,  the  quantizer  Q'^  p  ts  asymptotically  optimal. 

A  standard  technique  using  the  vector  Shannon  lower  bound 
on  the  fcth  order  rate-distortion  function  Rk(d)  then  gives  for  mean 
squared  distortion 

limsup[i//(QS  p)  -  <  ^log  2Te/(P).  (6) 

The  condition  for  (6)  to  hold  is  that  £||A'*||^  <  oo,  |h(/)|  <  co, 
and  HiQj  p{X'‘))  <  oo  for  some  d  >  0. 

Theorem  2  Suppose  the  conditions  of  Theorem  t  hold.  Then  the 
rate  H(Qd,v)  of  the  randomized  lattice  quantizer  with  basic  cell  V 
and  rth  power  distortion  d  satisfies 

lim  (7) 

d—o 

I.e..  the  asymptotic  performance  of  the  randoriized  lattice  quan¬ 
tizer  IS  the  same  as  the  asymptotic  performance  of  the  ordinary 
(non-randomtzed)  lattice  quantizer  qiven  by  (4). 

Corollary  1  For  r  =  2,  |h(/)|  <  oo,  and  £'||,Y*))^  <  oc. 
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We  consider  the  problem  of  transmitting  im¬ 
ages  over  noisy  channels.  Source  coding  is  done 
by  a  the  fixed  rate  scheme  (DCT  and  Product 
Pyramid  VQ  with  (L-K)-thresholding)  (1],  that 
allows  us  to  calculate  very  precisely  the  expected 
mean  square  error  caused  by  data  compression. 
This  together  with  the  a-priori  knowledge  of  the 
bit-sensitivity  of  the  compressed  image  data  en¬ 
ables  us  to  erform  a  highly  efficient  equal  or 
unequal  error  protection  for  image  transmission 
over  noisy  channels. 

Given  the  overall  rate  R  =:  Rs  +  Rc,  where  Rs  and 
Rc  denotes  the  source  rate  and  the  channel  rate  respec¬ 
tively  in  bits  per  pixel,  and  the  channel  SNR  Et/No, 
one  can  calculate  the  optimal  ratio  of  source  to  chan¬ 
nel  rate.  First  of  all  this  requires  the  knowledge  of 
mean=iE(p(.Y))  and  variance=var(/)(A))  of  source  vec¬ 
tors  X  for  the  rate  Rs  to  calculate  the  mean  square 
error  caused  by  source  coding  mses  [1].  Furthermore, 
we  need  the  bit-sensitivity  A,  of  the  compressed  data. 
Therefore,  it  is  sufficient  to  look  at  one  of  the  coding 
units  (CU)  the  whole  image  is  divided  into.  These  bit- 
sensitivities  can  be  derived  before  images  transmission, 
again  with  the  knowledge  of  mean  and  variance  of  the 
source  vectors  X. 

For  the  given  channel  SNR,  we  can  compute  the  bit  er¬ 
ror  probability  pj  of  our  channel  code,  a  64  state  RCPC- 
code  [2]  and  with  the  average  contribution  A,  of  an  er¬ 
ror  of  bit  i  on  any  compressed  CU,  we  obtain  the  mean 
square  error  caused  by  channel  errors 

5-1 

TIISCC  —  Pbt  '  »  ^1  €  ( -^pyr .  AjTj^rorf), 

1=0 

where  s  denotes  the  number  of  bits  of  the  compressed 
CU.  Finally,  we  have  to  optimize  the  overall  mse  = 
mses  -h  msec  for  the  given  rate.  Hereby,  we  either  u.se 
an  equal  error  protection  (EEP)  channel  code  for  the 
whole  data  or  even  better,  several  levels  of  protection 
for  the  different  sensitive  bits  (UEP). 
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All  these  a-priori  calculations  allow  us  to  adapt  the 
source  and  channel  coding  to  the  SNR  so  as  to  obtain 
the  lowest  mse  for  a  fixed  rate  or  to  calculate  the  re¬ 
quired  rate  for  a  given  maximum  mse  before  transmis¬ 
sion.  Simulation  results  (see  figure  (1)  -  unprotected, 
EEP,  UEP)  show  the  performance  of  our  scheme  and 
gains  of  up  to  5  (8)  dB  (unprotected)  in  PSNR  and 
up  to  4  (8)  dB  in  EsjNo  compared  to  JPEG  (Joint 
Photographic  Experts  Group)  especially  for  very  noisy 
channels  and  low  overall  rates  R. 


Figure  1:  PSNR  vs.  channel  EsjNo  of  the  PVQ  and 
JPEG  for  LENA  at  0.5  bits  per  pixel 
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Summary 

A  finite-state  vector  quantizer  (FSVQ)  is  a  finite-state  machine 
used  for  data  compression.  An  FSVQ  can  be  viewed  as  a  collection 
of  memoryless  vector  quantizers  (VQ’s)  where  each  input  vector  is 
encoded  using  a  VQ  associated  with  the  current  encoder  state;  the 
current  state  and  the  codevector  selected  (or  the  corresponding  index) 
determine  the  next  encoder  state  [1]. 

Unlike  an  ordinary  memoryless  VQ,  an  FSVQ  (with  K  states) 
can  take  advantage  of  the  memory  between  successive  source  vectors 
by  incorporating  a  feedback  structure  which  enables  it  to  choose  the 
appropriate  VQ  (from  ihe  set  of  K  different  VQ’s),  ^ven  the  past  be¬ 
havior  [1].  While  in  the  noiseless  case  this  feedback  structure  leads  to 
a  performance  gain  over  ordinary  memoryless  VQ,  in  the  presence  of 
channel  noise,  the  same  feedback  structure  renders  FSVQ  very  sensi¬ 
tive  to  the  channel  error  propagation  and  leads  to  severe  degradation 
in  the  performance  of  the  FSVQ  as  compared  to  the  memoryless  VQ. 
Indeed,  FSVQ  designed  as  in  [1]  falls  apart  in  the  presence  of  channel 
noise. 

In  this  paper,  we  propose  two  modified  FSVQ  systems  (NC-FSVQl 
and  NC-FSVQ2)  that  are  robust  to  channel  noise.  In  order  to  de¬ 
sign  NC-FSVQl,  we  first  redesign  the  FSVQ  system  taking  into  ac¬ 
count  the  channel  noise,  under  the  assumption  that  the  decoder  has 
perfect  knowledge  of  the  encoder  state  sequence.  This  leads  *o  the 
design  of  an  FSVQ  system  in  which  each  state  VQ  is  a  channel- 
optimized  VQ  [2].  The  transmission  of  “protected”  encoder  state 
indices  is  needed  to  allow  the  decoder  to  track  the  encoder  state 
sequence.  In  order  to  keep  the  overhead  information  (consisting  of 
encoder  state  indices)  low,  the  encoder  state  index  is  transmitted 
periodically,  say  once  for  every  n  input  vectors,  and  then  given  the 
state  indices  at  times  k  and  k  +  n  and  the  received  codewords  at  times 
i  =  k,  k-)-l,  ...,  k -I- n  - 1,  the  Viterbi  algorithm,  with  the  maximum 
Ukelihood  criterion,  is  used  to  estimate  the  encoder  state  indices  at 
times  »  =  fc-l-l,  k  +  2,  ...,  fc-l-n-latthe  decoder.  Since  some  of 
the  encoder  state  indices  at  times  i  =  k-t-l,  k-1-2,  ...,  k-l-n-1  may 
be  incorrectly  estimated  by  the  decoder,  we  have  developed  an  algo¬ 
rithm  to  do  a  judicious  indexing  of  the  codevectors  among  the  states, 
so  that  if  an  encoder  state  is  incorrectly  decoded  while  the  codeword 
is  received  correctly,  the  error  introduced  is  not  substantial  (it  should 
be  noted  that  this  indexing  provides  protection  against  state  index 
error,  while  protection  against  error  in  the  received  codeword  within 
a  state  is  implicitly  provided  by  the  channel  optimized  state  VQs). 
The  resulting  FSVQ  system  called  NC-FSVQl  performs  significantly 
better  than  the  ordinary  FSVQ  [1]  in  the  presence  of  channel  noise 
(memoryless  binary  symmetric  channel  assumed). 

We  have  used  NC-FSVQl  to  encode  the  Gauss-Markov  (G-M) 
source  with  a  correlation  coefficient  p  =  0.9.  When  the  channel  is 
noisy,  NC-FSVQl  outperforms  ordinary  FSVQ  significantly  at  all  lev¬ 
els  of  channel  noise.  We  observed  that  for  a  block  size  of  4,  under 
noisy  channel  conditions,  NC-FSVQl  performed  close  to  or  better 
than  the  channel-optimized  VQ  (CO-VQ)  [2j;  at  e  =  0.005  -  0.05, 
the  performances  are  close,  while  at  <  =  O.l,  NC-FSVQl  outperforms 
CO-VQ  by  0.4-0.9  dB  with  a  decoding  delay  of  6  vectors  (r  is  the 
bit  error  rate).  The  performance  gain  of  NC-FSVQl  over  CO-VQ 

^Tkis  work  was  npported  ia  part  by  National  Science  Foandation  grants  NSFD 
MIP-SS-573]]  and  NSFD  CDR-SVOOIOS,  and  ia  part  by  grants  from  NTT  Cor¬ 
poration  and  General  Electnc. 


increases  as  p  increases  and  for  p  =  0.95,  the  gain  is  more  than  2  dB. 

Althongh  the  NC-FSVQl  offers  robustness  agiunst  channel  ncuse, 
there  are  two  problems  associated  with  such  a  scheme.  First,  it  suffers 
from  a  delay  at  the  receiver  side.  The  second  problem  relates  to  the 
overhead  involved  in  explicitly  transmitting  the  protected  encoder 
state  information;  the  next  encoder  state  information  is  implicitly 
contained,  at  least  partially,  in  the  current  transmitted  codeword  and 
it  seems  that  protecting  the  codeword  (or  part  of  it)  instead  of  the 
state  (as  is  done  in  NC-FSVQl)  might  lead  to  a  similar  performance 
(in  terms  of  distortion)  at  a  lower  overhead  rate.  Let  us  now  focus  on 
the  second  issue.  In  a  given  FSVQ,  the  state  information  is  embed¬ 
ded  in  the  codeword  in  an  unstraightforward  way.  In  other  words, 
we  do  not  know  which  bits  in  the  codeword  should  be  protected  in 
order  to  effectively  protect  the  state  information.  In  an  effw<rt  to  re¬ 
solve  the  above  issue,  we  modify  the  FSVQ  design  algorithm  [3]  such 
that  all  the  state  information  is  forced  to  be  contained  in  1  =  log2  K 
most  significant  bits  of  the  codeword.  In  addition,  we  modify  the 
next-state  function  such  that  the  state  at  time  n  -t- 1  depends  only  on 
the  codeword  at  time  n  (independent  of  Ihe  state  at  time  n,  see  [3]). 
Under  these  conditions,  using  the  development  in  [4],  we  have  formu¬ 
lated  the  joint  source-channel  coding  problem  for  the  modified  FSVQ 
system,  developed  necessary  conditions  of  optimality  and  based  on 
these  conditions  described  a  de,>,gn  algorithm  leading  to  the  so-called 
NC-FSVQ2  system  [3]. 

We  also  used  NC-FSVQ2  to  encode  the  G-M  source.Under  noisy 
channel  conditions,  NC-FSVQ2  performs  significantly  better  than  or¬ 
dinary  FSVQ  and  as  compared  to  CO-VQ,  it  achieves  a  gain  of  0.4-0.8 
dB  at  €  =  0.005  and  0.7-1.0  dB  at  (  =  0.1.  Again,  this  gain  increases 
as  p  increases.  In  contrast  to  NC-FSVQl,  NC-FSVQ2  does  not  have 
any  delay  at  the  decoder  and  there  is  no  need  for  a  separate  chan¬ 
nel  code.  We  also  used  NC-FSVQl  and  NC-FSVQ2  to  encode  the 
speech  LSP  parameters  [5]  and  achieved  noticeable  gains  over  ordi¬ 
nary  FSVQ  and  CO-VQ  under  noisy  channel  conditions  [3]. 
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ABSTRACT 

Upper  and  lower  bounds  are  derived  for  the  average  number  of 
facets  per  cell  in  the  encoder  partition  of  binary  Tree-Structured 
Vector  Quantizers.  The  achievability  of  the  bounds  is  described  as 
well.  It  is  shown  in  particular  that  the  average  number  of  facets 
per  cell  for  unbalanced  trees  must  lie  asymptotically  between  3  and 
■1  in  ,  and  each  of  these  bounds  can  be  achieved,  whereas  for 
higher  dimensions  it  is  shown  that  an  arbitrarily  large  percentage 
of  the  cells  can  each  have  a  linear  number  (in  codebook  size)  of 
facets.  Analogous  results  are  also  indicated  for  balanced  trees. 
SUMMARY 

A  binary  Tne-Slructuitd  Vtctor  Quantizer  (TSVQ)  Q  can 
formally  be  defined  recursively  by  cutting  (or  .<p/itting)  one  cell  of 
an  existing  TSVQ  by  a  hyperplane.  As  in  general  VQ,  TSVQ’s  also 
partition  'R‘‘  into  a  finite  set  of  convex  polytopal  cells.  This  follows 
from  the  fact  that  every  encoding  region  is  a  finite  intersection  of 
half-spaces.  It  will  be  assumed  throughout  that  the  intersection 
of  any  cell-splitting  hyperplane  with  a  face  of  the  split  cell  is  of 
lower  dimension  then  that  of  the  face  it.self  or  equivalently  that  a 
general  position  restriction  holds. 

A  facet  of  a  convex  polytope  in  72‘‘  is  any  {d  —  1  )-dimensional 
face  of  the  polytope.  Two  cells  in  a  quantizer  partition  are  neigh- 
bori  if  each  has  a  distinct  facet,  one  of  which  is  a  subset  of  the 
other.  Kquivalently,  two  cells  ar<’  mighbors  if  the  intersection  of 
their  closures  has  rliiiiension  (/  -  I.  For  a  VQ  encoder  partition  in 
general  position,  there  is  a  one-to-one  correspondence  between  the 
facets  of  a  cell  and  the  cell’s  neighbors.  However,  for  TSVQ,  it  is 
possible  that  one  cell  could  be  adjacent  to  several  other  cells  via 
the  same  facet;  in  general,  the  number  of  facets  per  cell  is  le.ss  than 
or  equal  to  the  number  of  neighbors  of  the  cell.  Often,  however, 
these  two  quantities  are  very  similar  or  equal.  For  a  given  convex 
polytopal  partition  f)  of  'R‘‘  into  ii  cells,  define 
1)  Fttiii)  =  average  number  of  facets  per  cell  in  U. 

2)  G',/(ii)  =  nF,i{n) 

3)  Md(ii)  =  maximum  number  of  facets  of  a  cell  in  Q. 

Note  that  since  every  cell  of  any  vector  (piantizer  with  n  code- 
vectors  cannot  share  more  than  one  facet  with  any  other  cell  we 
obtain  tiu'  trivial  upper  bound  F,/(ii)  <  n  —  1.  In  two  dimensions, 
a  straightforward  application  of  Filler's  theorem  for  planar  graphs 
shows  that  Fj(ii)  <  (i  (i.e.  not  restricted  lo  f’.SVQ). 

In  this  paper  we  derive  several  bounds  on  Fd(n)  for  TSVQ 
and  point  out  the  achievability  of  the.se.  Specifically,  it  is  shown 
that  for  2-<limensiotial  iin balanced  TSVQ,  the  average  number  of 
facets  per  cell  is  asy mptotically  bonmled  above  by  ■1  and  below 
by  3.  and  that  the  bounds  are  achievable.  For  higher  dimensional 
spaces  an  upper  bound  of  n/2  and  a  lower  bound  of  3  are  given. 
It  is  also  shown  that  ti/i  and  3  respectively  are  achievable  in  this 
caw.  At  present,  it  is  an  open  question  as  to  whether  the  n/2 
bound  is  achievable  In  R.  it  is  trivially  always  the  case  that 
Fj(n)  =  2  -  2/n. 

In  R\  it  is  shown  that  if  an  asvmptotii  ally  large  fraction  of 
the  I'SVQ  cells  are  bounded,  then  Fzfu)  2:  1.  This  would  lend 
some  support  to  the  assumption  made  in  (l)  that  F,i{n)  =  2d  for 


the  case  d  =  2.  However,  for  d  >  2,  this  might  not  be  the  case. 
It  is  shown  analogously  that  for  balanced  TSVQ  with  d  >  2  the 
upper  bound  on  the  average  number  of  facets  per  cell  is  reduced 
to  log2  n.  It  should  be  emphasized,  though,  that  the  achievability 
of  the  bounds  presented  are  best  and  worst  cases,  over  the  class  of 
all  TSVQ’s,  and  it  is  a  question  for  future  study  as  to  how  likely 
they  are  to  occur  for  various  practical  TSVQ  systems. 

Proposition  1  For  unbalanced  TSVQ,  the  average  number  of 
facets  per  cell  satisfies 

3-4/n  <  F'd(n)  <  n/2  -  1/2  for  d>2,n>l 

3-4/n  <  F'd(n)<4-7/n  for  d  =  2,  n  >  3.  (1) 

The  next  several  results  exhibit  the  bounds’  achievability. 

Proposition  2  For  every  d  >  2  and  n  >  1  there  exists  an  unbal¬ 
anced  TSVQ  such  that  Fd(n)  >  n/i. 

Proposition  3  For  d  —  2  and  every  n  >  2  there  exists  an  unbal¬ 
anced  TSVQ  such  that  Fd{n)  =  4  —  7/n. 

Proposition  4  For  every  d  >  2  and  n  >  1  there  exists  an  unbal¬ 
anced  TSVQ  such  that  Fd{n)  =  3  —  4/n. 

The  following  corollary  shows  that  there  exist  d-dimensional 
TSVQ’s  such  that  an  arbitrarily  large  fraction  of  the  cells  each 
have  a  linear  number  (in  codebook  size)  of  facets. 

Corollary  1  For  every  d  >  2,  n  >  1,  and  a  €  (0,  1),  there  exists 
a  TSVQ  with  n  cells  such  that  at  least  on  of  the  cells  each  have 
at  least  (1  —  o)n  facets. 

For  balanced  trees  similar  results  are  obtained,  though  with 
a  reduction  from  linear  to  logarithmic  bounds.  The  results  are 
stated  in  terms  of  the  number  of  cells  n,  in  the  TSVQ,  though  it 
should  be  remembered  that  balanced  trees  only  exist  when  n  is 
some  integer  power  of  2.  In  the  following  proposition,  the  achiev¬ 
ability  of  the  lower  bound  for  d  >  2  and  the  upper  bound  for  d  =  2 
are  analogous  to  the  unbalanced  case.  However,  it  is  unknown  at 
present  whether  the  upper  bound  log2  n  is  achievable;  in  fact  it 
is  unknown  whether,  for  a  fixed  d  >  2,  it  is  possible  to  exhibit 
balanced  TSVQ’s  such  that  Fd(n)  is  unbounded. 

Corollary  2  For  balanced  TSVQ, 

3  —  if  II  <  Fd(n)  <  log^  n  for  d  >  2,  n  >  0 

3  —  4/n  <  Fdfn)  <  4  —  8/n  for  d  =  2.n  >  0,  (2) 
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Abstract  —  This  paper  presents  how  to  study  the  geometry  of  Voronoi 
regions  in  an  arbitrary  vector  quantizer.  Methods  to  find  the  location,  the 
extent,  and  the  neighbors  of  any  region  are  summarized.  Application  to 
fast  encoding  is  emphasized. 

I.  Introduction 

It  is  well  known  that  a  vector  quantizer  (VQ)  performs  better,  in 
terms  of  signal-to-noise  ratio,  than  a  scalar  quantizer  [4].  The 
improvement  increases  with  the  dimension,  but  the  price  paid  is 
complexity.  In  particular,  the  encoding  process  is  slower.  In  the  case 
of  nearest  neighbor  quantization,  which  this  paper  considers,  the 
straightforward  encoding  method  is  calculating  the  squared 
Euclidean  distance 

d(w,r,)  =  Sw-r,f  (1) 

between  an  input  vector  w  and  every  reconstruction  vector  r,; 
i  =  and  selecting  the  codeword  that  gives  the  minimum 

distance.  The  set  of  vectors  that  are  encoded  as  a  certain  codeword  k 
according  to  this  rule  is  called  the  Voronoi  region  (VR) 

V^  =|w:d(w,rj)<d(w,r,);i  =  l,...,n}  (2) 

Sometimes  suboptimal  VQs  are  accepted  in  order  to  decrease  the 
encoding  time.  Several  structures  have  been  developed  for  which 
fast  search  algorithms  exist,  e.g.  lattice  or  multistage  coders. 
However,  there  are  also  methods  to  improve  the  encoding  speed  for 
arbitrary  VQs,  without  paying  with  signal-to-noise  ratio.  The  meth¬ 
ods  often  require  precomputing  some  geometrical  properties  of  the 
VRs.  A  new  method  to  obtain  such  information  is  presented  here,  as 
well  as  an  encoding  algorithm  based  on  the  precomputed  data. 

II.  Examining  the  geometry  of  Voronoi  regions 

Some  relevant  types  of  problems  concerning  the  structure  of  given 
VRs  are: 

1.  What  values  of  a  certain  component  may  vectors  in  this  VR 
take  on? 

2.  On  which  side  of  a  certain  hypwrplane  lies  this  VR,  or  is  the 
VR  intersected? 

3.  Have  these  two  VRs  a  common  face? 


The  three  questions  are  closely  related.  All  of  them  have  applica¬ 
tions  in  different  algorithms  for  the  design  of  fast  encoders,  see 
below.  Probabilistic  methods  have  been  proposed  to  obtain  approx¬ 
imate,  or  likely,  answers  to  them  [21,  [31.  In  this  section,  determinis¬ 
tic  methods,  based  on  different  applications  of  linear  programming, 
are  presented  for  solving  these  and  related  problems  reliably. 

Consider  the  following  standard  formulation  of  a  linear  pro¬ 
gramming  problem; 

minc'^x 

u  /Ax  =  b  (3) 

when|x>o 

Much  research  and  much  literature  have  been  devoted  to  methods 
for  solving  it.  Two  of  the  main  approaches  are  the  simplex  method 
and  Karmarkar's  algorithm,  both  having  numerous  variations  (IJ. 
From  optimization  theory  it  is  known  that  there  exists  a  dual  prob¬ 
lem  to  (3), 


maxb'^w 

whenA^wSc 


(4) 


the  solution  of  which  is  strongly  connected  to  the  solution  of  (3). 
Actually,  both  mentioned  methods  generate  the  solution  of  (4)  as  a 
by-product  when  solving  (3). 

The  inequality  constraints  in  (4)  form  a  convex  polytope.  They 
describe  the  VR  Vj  (2)  of  a  certain  codeword  k  if 

'^"(*'••*‘-'*'•'•*-1  (5a) 

c  =  {c,...c,., 

where  for  every  i 

c  MM 
'  2 

Thus,  the  dual  problem  can  be  used  for  finding  extrema  of  a  VR. 


Depending  on  the  choice  of  the  dual  objective  vector  b,  different 
extrema  will  be  found  and  different  properties  of  the  VR  will  be 
investigated.  Especially,  if  b  is  chosen  nrst  as  a  unit  vector  and  next 
as  the  same  vector  negated,  problem  1  above  is  solved  by  two  linear 
programs.  If  this  is  repeated  for  all  coordinates  and  all  VRs,  a 
circumscribed  hyperrectangle  will  be  found  for  each  VR,  which  is 
the  precomput^  information  required  for  encoding  with  the 
Projection  Method  (2J. 

Problem  2  is  solved  similarly  by  two  linear  programs,  if  b  is  set 
orthogonal  to  the  given  hyperplane,  pointing  in  both  directions.  If 
the  two  extrema  lie  on  the  same  side  of  the  plane,  so  does  the  whole 
VR;  otherwise  it  is  intersected.  The  answer  to  this  kind  of  questions 
is  vital  for  the  design  of  the  decision  tree  used  in  the  Binary  Hyper¬ 
plane  Testing  Algorithm  13]. 

To  test  the  neighborship  between  VRs  j  and  It  (problem  3),  b  is 
chosen  equal  to  a  ,  which  is  orthogonal  to  the  common  face  of  V 
and  V,,  if  such  a  face  exists,  whereas  A  and  c  as  before  denote  W 
(5).  With  this  input,  a  linear  programming  algorithm  will  return  the 
point  w  in  V,  whose  projection  on  is  closest  to  r^.  If  the  two  VRs 
have  a  common  face,  the  dual  optimum  w  will  inevitably  lie  on  it. 
The  primal  optimum  shows  whether  this  has  occurred;  the  face  was 
reached  if  and  only  if  the  component  of  x  corresponding  to  is 
greater  than  zero. 

A  VR  is  defined  by  n  - 1  linear  inequalities  as  in  (2)  or  (4).  Some 
of  them  are  in  general  redundant.  Define  the  set  of  neighbors  to 
a  codeword  k  as  all  codewords  whose  VRs  have  a  face  in  common 
with  V, .  The  corresponding  inequalities  are  the  only  ones  needed  to 
be  considered  in  order  to  determine  if  a  vector  w  belongs  to  a  certain 
VR  V,: 

k',={w:a><c,;teA/.}  (6) 

Now,  solving  problem  3  for  all  pairs  of  VRs  in  a  VQ  generates  the 
complete  neighborship  table  N,;  i  =  l,...,n,  which  is  a  useful  tool  for 
analysis  as  well  as  in  applications.  Because  the  description  (6) 
defines  exactly  the  same  region  as  (2),  but  more  economically,  the 
table  speeds  up  other  geometrical  studies,  such  as  the  solution  of 
problems  1  and  2.  In  addition,  it  makes  a  new  approach  to  fast 
encoding  possible,  called  neighbor  descent. 

111.  The  "NEIGHBOR  DESCENT"  ENCODING  METHOD 
Suppose  that  a  vector  w  is  to  be  encoded  and  that  there  is  reason  to 
believe  that  r,  is  a  good  reconstruction  vector  for  w.  Calculate  the 
distortions  for  all  the  neighbors  of  k,  that  is,  the  distances  d(w,r,); 
I  €N,.  Replace  k  with  the  neighbor  that  has  the  smallest  distortion 
and  restart.  If  no  codeword  in  N,  is  better  than  k  itself,  then  stop. 

Theorem  of  uniqueness:  In  any  VQ,  for  any  input  w,  no  more  than 
one  codeword  can  have  a  smaller  distortion  than  all  its  neighbors. 

A  necessity  for  the  success  of  the  method  is  that  a  path  through 
neighboring  VRs,  along  which  the  distortion  d(w,  r,)  is  morrotonic 
decreasing,  does  rwt  terminate  in  a  suboptimal  local  minimum.  The 
above  theorem  states  that  this  can  never  be  the  case.  Its  proof 
follows  as  a  consequence  of  (6)  and  the  observation  that  a  vector 
cannot  belong  to  the  interior  of  more  than  one  VR. 

The  performaiKe  of  the  neighbor  descent  method  was  evaluated 
in  experiments  on  VQs  without  an  induced  structure.  The  results 
show  that  most  of  the  n  distance  calculations  can  be  avoided  with 
the  neighbor  descent  method.  The  reduction  is  greatest  for  V(3s  with 
high  bit  rates,  or,  if  the  rate  is  kept  constant,  in  many  dimensions. 
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Abstract:  An  asymptotically  optimal  code  with  a  fidelity  crite¬ 
rion  for  any  memoryless  gaussian  source  is  proposed.  It  is  shown 
that  the  proposed  code  achieves  the  rate-distortion  bound  with 
probability  1  —  e  for  any  given  distortion  level  under  the  squared- 
error  criterion.  Asymptotical  behaviors  of  the  code  performance 
are  also  analyzed  in  detail. 

Introduction 

To  develop  data  compression  schemes  with  a  fidelity  criterion  is 
essentied  especially  for  continuous  sources.  It  is  therefore  impor¬ 
tant  to  devise  a  general  method  to  encode  an  output  sequence 
of  preictical  analogue  sources. 

In  this  manuscript,  any  memory  less  gaussian  source  is  consid¬ 
ered,  but  it  is  sufficient  to  treat  one  with  zero  mean  and  unit 
variance.  Let  x  =  •  •  •  a:„  be  an  n-tuple  of  source  symbols 

that  satisfies  ij  ~  iV(0, 1*)  for  all  i  =  1,  •  ■  ■ ,  n.  The  probability 
density  function  can  be  written  as 

p(x)  =  (2t)-?  exp[-^(2:f  -1-  -t-  •  •  •  +  z^)] ,  (1) 

and  each  x  can  be  treated  as  an  element  of  R".  An  original  word 
X  =  zi  •  ■  •  z„  is  encoded  into  a  reproduction  word  x  =  Zi  •  •  •  z„ 
by  a  fixed-to-fixed  length  code.  The  distortion  between  x  and  x 
is  defined  by 

d(x,x)  ='  -  -  **)*■ 

The  rate-distortion  function  of  the  memoryless  gaussian  source 
with  zero  mean  and  unit  varieince  under  squared-error  criterion 
is  of  the  form 

/?(A)  =  ilog^i  A€(0.1]  (2) 

(See  [1].)  In  the  following  section,  it  is  shown  that  an  asymp¬ 
totically  optimal  code  which  achieves  (2)  can  be  generated  with 
probability  1  —  £  for  any  given  distortion  level  A  €  (0, 1]. 

Main  Results 

Since  the  probability  density  function  ( 1 )  is  sphere-symmetric  for 
any  word  x  of  length  n,  it  is  natural  to  have  an  idea  of  encoding 
X  by  two  separated  steps,  i.e.,  1)  to  quantize  the  magnitude  j|x|| 
and  2)  quantize  the  shape  x  =  x/||x||,  where  ||'||  denotes  the 
Euclidean  norm.  First,  select  an  arbitrary  set  A  =  {oi,  •  •  •  ,at} 
satisfying  the  following  four  conditions: 

Cl)  All  elements  must  satisfy  0  =  Uj  <  Uj  <  •  •  •  <  <  oo, 

C2)  ol  must  satisfy  at  >  •/n, 

C3)  ^  ^  m^M(a/+i  —  o/)  must  satisfy  Jir^(^*/n)  =  0, 

C4)  L  must  satisfy  Jii^[(log2  L)/n]  =  0. 

Let  y  —  {Vp'-'iI/m}  be  a  set  satisfying  ||y„,||  =  1  for  all 
m  =  1,  •  ■  • ,  A/,  that  is,  all  the  elements  of  y  belong  to  the  n- 
dimensional  unit  hypersphere  S""‘.  Encoding  is  defined  as  a 
mapping  — •  A  x  y  specified  by  and  such  as 

VJ(X)  =  (,5l(x)  •  V>2(X),  (3) 


where  :  ii"  — ►  A  and  tp^-.  R”  —*  y  are  two  mappings  defined 
as  follows: 

ipi(x)  =  O/  if  a,  <  ||x||  <  a,+i, 

'^{*)  =  Vm  if  ll*-ymll  =  min||x-yll. 

The  rate  R  and  the  average  distortion  A  of  this  code  are  given 
by 

R  =  ilogjLAf,  (4) 

A  =  J^^p(x)d{x,p{x))dx,  (5) 

respectively.  The  following  theorem  evaluates  the  average  dis¬ 
tortion  caused  by  the  proposed  code. 

Theorem  Let  A  €  (0, 1]  be  arbitrarily  given.  Let  M  = 
=  fx|5”~'|/(|5""*|AT“')],  and  select  an  arbitrary  set 
A  satisfying  Cl)  ^  C4).  If  M  points  on  5"“’  are  chosen  in¬ 
dependently  as  elements  of  y,  then  for  any  6  >  0  and  e  >  0 
there  exists  an  integer  no  =  that  satisfies  following  two 

relations: 

R  <  R{A)  +  6 

£[Al  <  A-^£  (6) 

for  all  n  >  no,  where  .E(A]  denotes  the  expectation  of  A  with 
respect  to  the  choice  of  y.  Moreover,  for  any  e'  >  0  there  exists 
an  integer  nj  =  ni(£')  that  satisfies 

K[A1  <  £'  (7) 

for  all  n  >  ni ,  where  V^jA]  denotes  the  variance  of  A  with  respect 
to  the  choice  of  y. 

For  any  given  A  e  (0,1),  the  asymptotical  behavior  of  this 
code  is  characterized  in  the  following  way: 


R  =  R(A)-l-0(ilog2L), 

(8) 

a  -  a +  0(9 

(9) 

It  is  easy  to  construct  A  that  satisfies  Cl)  ~  C4).  For  instance, 
by  setting/,  =  n  +  l  and  oj  =  (l-l)/y/n  (1  =  1,  -  ■  ■  ,L),  it  is  easy 
to  check  that  this  example  satisfies  Cl)  ~  C4)  with  C  = 

Hence,  R  converges  to  R{A)  of  order  Oilofyn/n)  and  A  con¬ 
verges  to  A  of  order  0(\ln^).  The  paper  [2]  indicates  a  conjec¬ 
ture  that  the  convergence  of  ©(logj  n/n)  in  R  and  C>(l/n)  in  A 
would  be  the  tightest  possible.  Our  result  in  (8)  and  (9),  how¬ 
ever,  not  only  reveals  that  there  exists  a  code  that  has  a  better 
asymptotical  behavior,  but  also  represents  a  more  general  trade¬ 
off  relationship  between  the  rate  and  the  average  distortion. 
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Summary 

We  present  a  unified  theory  of  information,  which 
naturally  incorporates  Shannon’s  theory  of  infor¬ 
mation  transmission  and  the  theory  of  identifica¬ 
tion  in  the  presence  of  noise  as  extremal  cases.  It 
provides  several  novel  coding  theorems. 


ON  THE  PROBABILITY  OF  UNDETECTED  ERROR  FOR  ITERATED  CODES 
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Abstract 

In  practical  applications  of  coding  theory,  e.g.,  ARQ  system,  Sig¬ 
nature  analysis,  etc.,  the  probability  of  undetected  error,  /*,(£),  for 
linear  block  codes  plays  an  improtant  part.  The  exact  value  of  /’.(f) 
can  be  calunilated  by  using  the  weight  distribution.  However  for  al¬ 
most  all  codes,  the  weight  distribution  is  not  exactly  known.  Hence, 
some  methods  calculating  the  upper  bound  of  /*„(£)  have  been  pro¬ 
posed.  It  has  been  known  by  deriving  the  average  probability  of 
undetected  error  for  the  ensemble  of  all  (n,I:,d)’  linear  codes,  that 
there  exists  (n,h,d)  linear  code  having  P„{£)  upper  bounded  by 
2-(i-r)n^  where  r  =  lb/n[l].  Some  codes,  e.g.,  Hamming  codes,  sat¬ 
isfying  Pu(£)  <  have  been  found  out  for  finite  code  length 

n[3].  In  these  codes,  r  — «  I,  and  Pu(e)  does  not  converge  to  zero  for 
n  — >  00.  The  codes  with  Pti(£)  converging  exponentially  to  zero  with 
order  n  for  n  — •  oo  is  not  also  explicitly  constructed.  Of  course,  if 
0  <  r  <  1,  and  the  asymptotic  distance  ratio  6,S  >  0,  for  ft  — •  oo,  it 
is  trivially  shown  that  P„(£)  converges  to  zero.  For  example,  Juste- 
sen  codes[4]  satisfy  the  above  conditions.  However,  if  d  =  0,  it  is 
hardly  to  shown  that  P„(£)  — >  0  as  n  -•  oo. 

In  this  paper,  it  is  shown  from  the  theoretical  viewpoint  that  under 
the  code  rate  fio.O  <  Ho  <  li  iterated  codes  with  Pu(£)  converging 
to  zero  can  be  explicitly  constructed,  although  the  asymptotic  dis¬ 
tance  ratio  Ao,  Ao  =  0,  for  the  code  length  No,  No  -•  oo.  It  is  also 
shown  that  there  exist  P,(£)  of  these  explicitly  constructed  codes 
converging  exponentially  to  zero  with  order  No,  for  No  -»  oo. 
Throughout  this  paper,  we  assume  that  codes  are  the  binary  lin¬ 
ear  block  codes,  and  channel,  the  binary  symmetric  channel  with 
cross-over  probability  p,0  <  p  <  1/2. 

1.  CODING  AND  DECODING  METHODS 

A.  Coding  Method 

Let  ®  be  direct  product.  Then  (No,  No,  Dq)  iterated  codes  are 
constructed  by  ci  ®  C2®  •  •  ■  ®  c,,  where  c,  is  the  t-th  stage  (n,,k„d,) 
code,  1=1,2,. . .  ,1. 

B.  Decoding  Method 

Let  G,{X),Y,{X),  and  5, (A),  be  the  generator  polynomial  of  the 
i-th  stage  code  c, ,  the  polynomial  of  the  subsequence  with  length 
n,  of  the  rccieved  sequence  at  the  step  i,  and  the  polynomial  of  the 
syndrome  of  the  subsequence  with  length  n,  at  the  step  i,  respec¬ 
tively.  Then,  decoding  method  of  step  i  is  as  follows. 

Step  i  :  After  partitioning  the  subsequence  with  length 
n,n,_|  •  ■  •  n,k,-i  •  •  •  of  the  rccieved  sequence  with  length  No  into 
n,n,_i  ■  Jtj  sequences  with  length  n,,  V^A)  of  parti¬ 

tioned  each  sequence  is  sucssesivelly  divided  by  G,(  A).  If  S,(  A)  ^  0, 
a  error  is  detected.  If  all  of  n,n,_i  •  •  •  n,+ih,_i  ■  •  •  kiS,(X)  are  zeros, 
go  to  the  step  i-l-1. 

2.  THE  PROBABILITY  OF  UNDETECTED  ERROR 

"The  research  leading  to  this  paper  was  partially  supported  by  the  Min- 
isutry  of  Education  under  Grant-in-Aids  04750364  for  scientific  Research,  and 
by  Waseda  University  Grant  for  Special  Research  Project  No  91A-I8S 

*Kanagawa  Institute  of  Technology.  JAPAN 

'Sony  Corporation,  JAPAN 

•Waseda  University,  JAPAN 

'The  (n,i, d)  code  denotes  the  code  of  length  n.  the  number  of  information 
symbols  k,  and  minimum  distance  d 


Lemmal :  The  upper  bound  of  the  probability  of  undetected  error, 
pL'^U)  for  iterated  codes  Cq*  is  given  by 

-pr-’T]*'*'-'  (1) 

t=i  1=1 

where  A,j  is  the  number  of  codedwords  of  Hamming  weight  j  for  the 
code  c,. 

Lemma  2  :  The  sufficient  condition  to  construct  iterated  codes 
Co  whose  code  rate  Ho,0  <  Ho  <  1  for  Ao  —>  oo,  that  is,  s  — .  oo, 
is  r,  <  7,4.],  where  r,  =  k,/n„  and  i  =  1 , 2, . . . ,  oo. 

Lemma  3  :  The  asymptotic  distance  ratio  Ao  of  iterated  codes 
Co  satisffying  lemma  2  is 

Ao  =  lim  (DojNo)  =  0.  (2) 

From  lemmas  1,  2,  and  3,  we  have  the  following  theorem. 

Theorem  1  :  The  probability  of  undetected  error,  H«(£)  for  iter¬ 
ated  codes  Co  is  given  by 

P.(£)  =  ^lim  H<'>(£)  =  0,  (3) 

where  0  <  Ho  <  1,  and  Ao  =  0  as  Ao  -♦  oo. 

3.  SOME  EXAMPLES 

Example  I  :  Iterated  codes  Cq  *  constructed  by  applying  the  »-th 
stage  code  c,  with  -  (m  -I-  i),4)extended  Hamming 

code,  where  m=2,3, _ Note  that  these  iterated  codes  are  error-free 

codes  proposed  by  P.  EliasfS]. 

Example  2  :  Iterated  codes  Cq  *  constructed  by  applying  the  >-th 
stage  code  c,  with  (2”’+‘“',  2'"'*'’~’  -  1,2)  even  parity  check  code, 
where  m=l,2, _ 

Example  3 :  Iterated  codes  Cq^  constructed  by  applying  the  i-th 
stage  code  c,  with  —  )_2”’''''"’  -  (m  -b  i  -b  1), 3)Hamming 

code,  where  m=2,3, _  Note  that  P„(e)  of  these  iterated  codes 

satisfy  P,(e)  < 

References 

[1]  S.  Lin  and  D.  J.  Costello,  Jr.,  Error  Control  Coding  :  Funda¬ 
mentals  and  Applications.  Englewood  Cliffs,  NJ  :  J  Prentice- Hall, 
1983. 

[2]  J.  K.  Wolf,  A.  M.  Michelson,  and  A.  H.  Levesque, “On  the  prob- 
abiity  of  undetected  error  for  linear  block  codes,”  [EEE  Trans. 
Commun.,  vol.  COM-30,  pp.  317-324,  1982. 

[3]  J.Jnslesen,“A  class  of  constructive  asymptotically  good  algebraic 
codes,”  /EEE  Trans.  Inform.  Theory,  vol.  IT-18,  pp.  652-6S6, 
Sept.  1972. 

[4]  P.  Elias,  “Error-free  coding,”  IRE  Trans.  PGITE-4,  pp.  29-37, 
1954. 


397 


ON  THE  KEY  EQUATION  FOR 
n-DIMENSIONAL  CYCLIC 
CODES 

Herve  CHAD  ANNE  '  and  Graham  H.  NORTON  ^ 

Abstract 


We  are  now  ready  to  state  the  key  equation. 

Theorem  1  In  F((X~^)),  we  have  crFi  =  Xu;,  gcd((T,uj)  =  1. 

Remark  :  We  can  consider  Theorem  1  as  generalization  of  the 
key  equation  for  BCH  codes  to  n-dimensional  cyclic  codes.  As 
for  the  cyclic  case,  the  spectral  behaviour  of  u;  and  a  allow  us  to 
•  a  equals  0  only  on 

recover  e  .  ^  ^  equals  0  only  on  E,  \  supp(e). 


Let  R  =  -  1,...,A'„^»  -  1)  be  a 

semisimple  algebra.  Ideals  in  R  are  known  as  ii-ditnensional  cyclic 
codes  or  abelian  codes. 

Let  F  be  the  smallest  extension  of  K  containing  an  N'*  primitive 
root  of  unity  £>„  for  t;  =  1 , . . . ,  n. 

I..et  e  €  F[A'i, . . . ,  A'„]  be  a  non-zero  polynomial. 

We  consider  the  series  in  F[[A',~', . . . ,  A’“‘l] 

rdAr',...,A'-')  = 

We  first  introduce  univariate  polynomials  <t„  6  F[A„],  v  = 
and  a  multivariate  polynomial  u;  g  F[A'i, . . . ,  A„]  such 

thatp-;""^^  ,  = 

I  gcd(uj,cr,  . .  .an)  =  1 

Thus,  we  show  that  the  spectral  behaviour  of  <t  =  Oi  . . .  <t„  and  u; 
allows  us  to  recover  e. 

In  the  second  part,  we  reinterpret  the  polynomials  a  and  u;,  re¬ 
garding  r,  as  the  generating  function  of  the  n-dimensional  linear 
recurring  sequence  e  =  (e(ap' , ..., a"'"))  . 

Then  we  show  how  to  obtain  <r„. 

Hence,  we  deduce  a  new  inetliod  for  decoding  abelian  codes. 
Notation 


X  := 

F(Xj 

F((X-M) 

i  ;  = 

s-„(i)  := 

ifl(i)  :  = 

i  k  o 

supp(^PiX‘)  := 

A(p)  := 

e(o*)  :  = 

S(p)  := 


A, . . .  .V„ 

F(A'.,A'i,...,A'„l. 

Laurent  series  in  X"'  over  F. 

iv 

(l  1 ,  .  .  .  ,  lu-.  1  f  ^v4-l  1  •  •  •  1  *n) 
iv  <  hn,  V  g  ll,n). 

{i  :  pi/OgF}. 

The  degree  of  p  g  (F{;^I){A''„J. 
e(oV . o^") 

(^i(p),.-,Mp)) 


1  The  key  equation. 

Let  e  g  F[X]  be  a  non-zero  polynomial.  Our  goal  is  to  show 
how  the  series  r^(X"')  =  Ei-<o c(o“‘)X‘  g  F[[X'*1)  may  be 
written  as  a  quotient  of  two  relatively  prime  polynomials. 

Let  E,  be  the  smallest  cartesian  product  containing  supp(e). 

Definition  .'^nr  v  g  [l,;;],  dejinc  the  error-locator 
A„-polynomial  by  £t„(A„)  =  FI, -  q'J)  €  FfA'.). 
We  call  a  ~  ^rfX)  =  []"=;  <^v(-^v)  the  error-locator 

product  polynomial  of  e.  I’inally,  we  call 

oj  u;(X)  Ei€»'>pp(f )  { riust  nje^vtApapt^n.  ^  A  u  riy )) 

the  error-evaluator  polynomial  of  e. 
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2  The  linear  recurring  sequence  con¬ 
text. 

•As  in  [2],  we  let  55u(F)  denote  the  commutative  F-algebra  of 
(-N)"  -indexed  sequences  s  :  (— N)"  — »  F. 

The  generating  function  of  s  is  r,(X~^)  =  EixoSi^*- 
The  characteristic  ideal  of  s  g  5;o(F)  is 
y4nn(s)  =  {5Zi£s«pp</) /iX*  :  Vj  g  (— N)",  Eie»upp(/) =  0}- 
If  Anu{s)  yt  {0}  then  s  is  called  a  linear  recurring  sequence. 
In  section  1  we  studied  the  sequence  e  given  by  e-,  =  e(a”‘). 
Clearly  c  is  linear  recurring  sequence  and  X^'  —  1  belongs  to 
Ann(e)nF(A'vj,  v  €  (I,nJ.  In  fact,  we  can  say  more  ; 

Theorem  2  For  v  6  [l,n),  cr,.  i.s  the  monic  polynomial  of  mini¬ 
mal  (positive)  degiee  in  Ann(e)nF[A'„l. 

We  conclude  this  section  by  recalling  how  cr„  may  be  computed. 

Theorem  3  [2}  Let  f„  g  Ann(e)nFlA'„),  6vfv  >  1,  for  v  € 
[l,ii).  Put  dy  =  6(  /„).  For  i  g  (-N)""'  define  the  i- 

usl.u^v 

section  of  e,  €  i'<o(F)  i>y  ,  where  S^(k)  =  i  and 

»v(k)  =  j. 

Then  Ov  =  lent =  A»in(ep’''),0  ;:<  i  ^  dv  -  !}• 

3  Decoding  algorithm. 

If  C  is  an  ideal  of  R,  we  know  that 

c g (7 c(o;' . a;- )  =  0,  v(t,, ...,i„)eZc 

where  Zc  is  a  set  which  only  depends  on  C  [Ij. 

From  tn  =  c-bc,  c  g  C,  we  can  get  the  parts  of  Fj  corresponding 
to  Zcr. 

This  yields  the  following  algorithm 

(1)  From  the  known  terms  of  find 

•  jrp''’  such  that  =  /4nn(e,^*’'')  (used  in  point  2), 

•  the  missing  terms  of  (used  in  point  3). 

(2)  For  i>  g  ll,ii],  put  Oy  =  /cm(jfi^*'’*)  (Theo.  3),  and  <t  =  cr„ 

i>=i 

(3)  Put  lu  =  ■  ■  (Theo.  1) 

A 1  . . .  A  „ 

(4)  From  a  and  ui  deduce  e  (Remark  on  p.  1) 
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ABSTRACT 

It  is  known  that  cyclic  codes  of  composite  length  are  equivalent  to  gener¬ 
alized  concatenated  codes  with  inner  cyclic  codes  and  outer  cyclic  or  con- 
stacyclic  codes.  Here  it  is  shown  that  there  is  a  large  class  of  extended  cyclic 
codes  of  length  Q"',  that  can  be  constructed  by  generalised  concatenation 
of  shorter  extended  cyclic  codes.  This  class  includes  the  generalized  Reed- 
Muller  codes  and  the  Euclidean  geometry  codes.  In  many  cases  a  simple 
multistage  decoder  corrects  all  errors  of  weight  less  than  half  of  the  true 
minimum  distance  of  the  code. 

SUMMARY 

We  start  with  some  notations.  Let  (oj, . )  denote  aaQxQx  ■■  xQ- 

array  over  GF{q),  where  Q  =  q‘  for  some  integer  s  >  0.  We  associate  a 
polynomial  in  m  variables 

g-ig-i  g-i 

=  H  H  ■  Z)  •*ir 

j, =0/1=0  /_=0 

with  each  array  . j^).  Suppose  ao,ai, . .  ..oig-i  are  the  elementes 

of  GF(Q).  Then  the  associated  polynomial  is  uniquely  determined  by  the 
follosring  equation:* 

. . ®  ^  ^  Q 


Deftnitionl  LeiJ  ieaa%i$etof{(ji,h,...,jm)\0<3uh,-Jm<Q]- 
An  array  over  GF(q)  is  a  codeword  in  the  code  C{q,s,m,J), 

iff  Us  associated  polynomial  has  coefficients  ^/,ji, equal  to  zero  for 
each  The  set  J  will  be  called  the  zero  set  of  code 

C(q,s,m,J). 

Considering  the  cor/jugacy  constraints  we  get  the  complete  zero  set  j: 

3  =  {('•g(«'‘ii).rg(«'‘/3) . rq{qi‘j„)  \  (ji,/, . j„)  e  J,0  <  /i  <  s), 

where  rg(l)  is  given  by 


the  number  in  {0, 1 . Q  —  2},  which  is  congruent  to 


rg(/)  = 

1  f  modulo Q—I, 

1  °’ 

itl^O  mod  1, 

if  1  =  0, 

1  0-1, 

if  1  =  0  mod  0-1  Btid  1  ^  0. 

.1 

I,.)  is  a  codeword  in 

the  code  C(q,s,m,J),  all  coefficients 

^/iji,  Oi.U,  •  • -.Jm)  €  j,  of  its  associated  polynomial  are  equrd  to 
zero.  The  number  of  information  symbols  K  can  be  calculated  by  the  for¬ 


mula: 


/f  =  <?-"-  \j\. 


It  can  be  shown  that,  if  the  zero  set  J  satisfies  some  condition,  the  dual 
code  of  a  C{q,  s,  m,  J)  code  also  belonp  to  the  class  of  codes  defined  above. 
All  C(q,  s,  m,  J)  codes  with  m  >  1  can  be  constructed  by  generalized  con¬ 
catenation  of  extended  cyclic  codes  of  length  Q  =  q‘.  If  m  >  2,  the  outer 
codes  of  the  generalised  concatenated  code  (GC  code)  again  are  GC  codes. 


It  is  readily  seen  for  generalized  Reed-MuUer  codes  and  it  can  be  proved 
for  Euclidean  geometry  codes  that  both  classes  belong  to  the  C(q,s,  m,J) 
codes.  We  want  to  show  that  there  are  other  extended  cyclic  c^es  which 
are  equivalent  to  C(f ,  s,  m,  J)  codes. 


{(■<(].  •  ■  ■,(m)  be  a  basis  of  GF(Q^)  over  GF{Q).  Then  every  ele¬ 
ment  in  GF(Q'")  can  be  represented  by  a  sum  zifj  +  -H - h  *m(mt 

where  Z|,Z], . . .  ,Zn,  €  GF(Q).  We  denote  the  index  sets  £(h)  and  C‘{h): 


Ab)  =  {O’l.iz . >m)|0<;i,ja . jm  <  Q,  }t+ji  +  ...  +  im  >  h], 

C‘(h)  =  {ji+hQ+...  +  j„(r-^  I  Uuh . Jm)  e  Ab)}- 

'W«  define  0»«liiid(>*»0,lf/>0.  FVnthenaon  (0  -  1)  ei  1  in  OF(0). 
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Theorem  1  Suppose  s  =  1  and  A(zi,  Zj, . . . ,  z„)  is  the  polynomial  associ¬ 
ated  with  as  amp  . j_)  over  GF{q)  =  GF{q‘)  =  GF(Q).  Let  B(x) 

be  a  polynomial  over  GF{tJ™)  with  degree  less  than  Q™  such  that  for  all 
xi,xa,...,x„eGF(Q) 

Qm  _  +  *i€j  +  •  ■  +  Xmim)  =  ^  A(zi  ,Zj,...,Xm). 

Then: 

V{ji.j,,...,i„)e£(h)  :  Aj.j . J.  =0  O  V;6r(h)  :  fl,- =  0 

Corollary  2  The  codes  C(q,sm,l,C'(h))  and  C(q,s,m,C(h))  are  equiva¬ 
lent. 

Clearly,  if  two  codes  are  equivalent,  their  dual  codes  are  also  equivalent. 

Since  the  C(q,  s,  m,  ff)  codes,  m  >  1,  are  GC  codes,  a  multistage  decoder 
can  be  used  and  it  corrects  all  errors  of  weight  leas  than  half  of  dac-  doc 
denotes  the  well  known  lower  bound  (see  for  example  [3],  p.591)  for  the 
minimum  distance  of  GC  codes. 

In  table  1  some  extended  cyclic  codes  of  length  64  are  tabulated,  which  are 
the  dual  codes  of  some  codes  C(q,sm,  l,C’(h)).  They  can  be  constructed 
by  generalized  concatenation  of  codes  of  length  4  and  8.  The  number  i 
of  information  symbols,  the  mininnim  distance  d,  the  distance  bound  dqc, 
the  length  q'  of  the  codes,  which  are  used  in  the  generalized  concatenation, 
and  the  exponents  of  the  roots  of  the  generator  polynomial  are  given.  Es¬ 
pecially  interesting  is  the  extended  cyclic  (64, 28, 16)  code,  since  a  modified 
multistage  decoder  can  correct  all  errors  of  weight  leas  than  half  of  the  true 
minimum  distance. 


n 

EM 

warm 

wm 

exponents  of  the  roots 

remark 

o 

MM 

6 

~ 

16,27,31 

EG- Code 

la 

8 

8 

a 

15,23,31 

BCH-Code 

la 

d 

mm 

a 

7,15,21,23,31 

EG-Code 

EM 

d 

wm 

8 

7,15,21,23,27,31 

m 

d 

fsm 

8 

7,13,15,21,23,27,31 

em 

d 

tm 

4 

7,11,13,15,23,27,31 

BCH-Code 

d 

EM 

EM 

4 

EiEgsgsg^Emm 

EG-Code 

m 

m 

tm 

EM 

3,5,7,11,13,15,21,23,27,31 

BCH-Code 

Table  I :  Some  binary  extended  cyclic  codes  of  length  64  which  are  equivalent 
to  GC  codes 
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ABSTRACT 

It  is  known  that  when  the  blocklengths  of  two  cyclic 
codes  are  relatively  prime  their  product  code  is  cyclic 
when  serialized  using  Chinese  Remainder  Theorem. 
When  the  blocklengths  are  equal  we  characterize  product 
codes  that  me  cyclic  when  serialized  either  rowwise  or 
columnwise. 


SUMMARY 

Given  two  codes  C ,  and  C  j ,  their  product  code  C  is 
the  code  whose  codewords  are  the  two  dimensional  arrays 
for  which  columns  are  codewords  in  C ,  and  rows  are 
codewords  in  C  2 .  The  component  codes  C ,  and  C  ^  may 
be  cyclic  but  the  product  code  is  not  necessarily  ^clic.  If 
it  is  cyclic,  with  appropriate  serialization,  then  it  is  called 
cyclic  product  code.  It  is  known  that  when  the 
blocklengths  n ,  and  n  j  of  C ,  and  C  2  are  relatively  prime 
then  C  is  iwclic  when  serialized  using  Chinese  Remainder 
Theorem  flj.  We  identify  cyclic  product  codes  when  block 
lengths  or  component  cyclic  codes  are  equal. 

Consider  a  two  dimensional  n^xn^  array  with  entries 
from  GF(q). 

Cto.o  Clo.l  ••• 

C!i.o  Qi.i  ••• 

a«,-i.o  ••• 

By  associating  this  array  to  an  clement 

8''0“P  algebra  KG, 

where  K=GF(q)  and  G  is  an  Abelian  group  of  order 
n  -  Rq"  ii  (“iq)  =  1.  which  is  a  direct  product  of  two  cyclic 
subgroups  of  order  n  0  and  n ,  with  generators? and  ?  „ , . 

one  can  interpret  Abelian  codes  which  are  ideals  of  KG 
as  product  codes.  With  this  interpretation  the  problem 
reduces  to  identifyi^  the  Abelian  codes  which  are  closed 
under  cyclic  shifts.  This  class  of  Abelian  codes  are  same 
as  separable  Abelian  codes  [2]. 

Abeuan  codes  are  characterized  in  the  transform  domain 

as  follows;  Define  a  mapping  I  from  n  -{0.1 . n  -  1  > 

to  no®n,.  where 

no“{0.1....no-l}  and  n,  -  {0. 1 , .. .  n ,  -  1 }. 
by  f(*)“('o.‘i)  where 

i-io+iiRo  ^  ien,ioeflo'‘i  efli.  (io-h)  *5 
called  the  mixed-radix  representation  of  i  and  the 
mapping  I  is  called  the  mixed-radix  serialization  (MRSV 
(MRS  corresponds  to  serializing  the  product  code 
columnwise.)  For  a  given  (io-<i)  the  subset 

. 2*-'(to.i,)} 

where  for  any  integer  a,  a ( i 0 ■  i  1 ) is  defined  to  be 
(aio(mod  no).ai,(mod  n,)  and 

2*((o> '1)  ~  (<o>(i)>  is  called  the  conjugacy  class 


containing  (io-'i)  -  The  appropriate  Discrete  Fourier 
Transform  (DFT)  is  pven  by 

•o" I • I “ J 

A.,  ,  I  I  , 

where  a  and  a , ,  are  elements  of  orders  n  „  and  n ,  in  the 

extension  field.  An  Abelian  code  can  be  defined  as  the 
set  of  n-tuples  whose  DFT  coefficients  are  zero  in 
specified  conjugacy  classes  [3]. 

For  the  case  when  n^-  n,  the  expressions  for  inverse 
DFT  respectively  for  Abelian  and  cyclic  case  are 


■o‘i» 


I  I  a. 


*1 Aj-l 

:  z  < 


Cif  “*■  2.  2.  Cl  A 


and 

/o-/,-.  (using  MRS). 


Weeantakea,^  -  a*  “.by  working  in  the  same  extension 

field  for  both  cases.  Now,  starting  with  the  transform 
vector  of  idempotent  generator  ofa  cyclic  code  in  both 
the  cases  and  then  replacing  the  transform  vector 
corresponding  to  the  shift  <  1;  0 . 1: 1  >  in  the  Abelian  case 
and  k  cyclic  shifts  in  the  cyclic  case  we  can  find  conditions 
on  Jaandj,  for  which  left  hand  side  of  both  the 

expressionsaresameforallio- i  I  •  l;o  and  it,.  This  leads 
to 

Theorem  1:  A  product  code  of  two  cyclic  codes  of  equal 
length  n  is  cyclic  when  serialized  columnwise  (rowwise) 
iff  the  Abelian  code  obtained  by  serializing  rowwise 
(columnwise)  has  idempotent  generator  w&>se  DFT 
vector  is  same  as  that  of  an  idempotent  generator  of  a 
cyclic  code  of  length  n  ^ . 

lending  the  above  approach  to  r  (r>2)  dimensional 
product  codes  the  following  theorem  can  oe  proved. 
Theorem  2:  When  codewords  are  written  as  monomials 
of  the  form 


^  ,  . r,  I  •  -  J 

*0‘l  *r-» 

then  the  idempotent  generators  of  (wclic  product  codes 
(under  MRS)  are  the  direct  sums  of  the  generators  given 
by 

I  I  ...  Z  Z  Xj  ...Xj'.'l  xi*0  Sk  ir  -  I .  for  some  <i^O. 


.<0  1,(0  <1.1(0  <t<Ci 


or  equivalently  [4], 

( z  x’.'ir  I  x'/],./  X  xiviV  z  x;*)  f 


or  some  i  *  0. 
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This  paper  discusses  the  algebraic  structure  of 
cascaded  Reed-Solomon  (CRS)  codes,  and  presents 
an  algorithm  for  decoding  them.  A  CRS  code  is  a 
cascade  (or  “generalized  concatenated”)  code  con¬ 
structed  using  Reed-Solomon  codes  as  component 
codes.  In  particular,  we  consider  hyperbolic  CRS 
(HCRS)  codes:  these  are  CRS  codes  designed  to 
have  the  minimum  distance  given  by  the  cascade 
code  bound.  Compared  to  Reed-Solomon  codes  over 
the  same  alphabet,  HCRS  cod**s  have  longer  block- 
lengths.  Compared  to  other  two-dimensional  cyclic 
codes  (products  of  Reed-Solomon  codes,  duals  of 
such  products,  and  codes  proposed  by  Sakata  [1]) 
with  the  same  minimum  distance,  HCRS  codes  have 
higher  rates. 

Consider  the  finite  field  with  q  elements,  and 
let  n  =:  9  —  1,  so  that  there  is  an  element  a  of 
which  is  a  primitive  n***  root  of  unity.  We  define 
y]  =  FJx,  y]/(x'‘  -  1,  y"  _  1).  Then  for  each 
value  of  d  we  define  the  set  of  parity-check  points: 

{(«•■, 0^)6  22  :(,  +  !)  (j-»-l)<d}. 

Finally,  we  define  a  code  HCRSj  consisting  of  code¬ 
words  /  which  vanish  at  each  parity-check  point: 
HCRSrf  =  {/  €  F,"*"[x,y]  :  f{x,y)  =  0  for  all 
(x,y)  6  Zd}-  The  minimum  distance  of  HCRSj  is 
the  value  d  given  by  the  cascade-code  bound  [2-4]. 

We  present  two  encoding  schemes  for  CRS  codes: 
a  transform-based  frequency-domain  encoder,  and  a 
systematic  time-domain  encoder  which  makes  use  of 
a  Grdbner  basis  for  the  code. 

We  now  describe  a  decoding  algorithm  which  cor¬ 
rects  t  errors  for  the  code  HCRS21+1.  The  decoder 
receives  a  corrupted  version  c(x,y)  +  e(x,y)  of  the 
codeword  c(x,y),  and  deduces  the  error  polynomial 
e(x,y)  by  determining  its  Fourier  transform,  E.  Ini¬ 
tially  the  decoder  knows  only  the  entries  of  E  corre¬ 
sponding  to  syndromes,  and  calculates  the  remaining 
entries  by  finding  two-dimensional  linear  recursion  re¬ 
lations  (2DR)  which  hold  on  all  of  E.  The  set  of  all 
2DR  relations  valid  on  the  error  transform  array  form 
an  ideal  called  the  error  locator  ideal  because  its  zeros 
determine  the  error  locations.  The  algorithm  recur¬ 
sively  iterates  through  the  entries  of  E  according  to 
the  pure  lexicographic  order,  maintaining  a  set  G  of 


2DR  relations  known  to  be  valid  on  processed  part 
of  the  error  transform  array.  Upon  termination,  G 
is  a  Grobner  basis  for  the  error  locator  ideal.  For 
each  entry  of  E  that  is  a  syndrome,  the  algorithm 
performs  a  validation  step,  and  for  unknown  entries 
of  E  the  algorithm  performs  a  calculation-validation 
step.  The  validation  step  is  the  same  as  the  main  step 
in  Sakata’s  algorithm  [1]:  each  2DR  of  G  is  checked 
for  validity  at  this  entry,  and  replaced  if  it  proves 
to  be  invalid.  In  the  calculation-validation  step,  the 
algorithm  first  calculates  the  entry  of  E,  then  per¬ 
forms  a  validation  step.  Each  relation  in  G  predicts 
a  value  for  the  unknown  entry.  We  show  that  only 
one  of  these  predicted  values  is  consistent  with  an  er¬ 
ror  pattern  e(z,y)  of  weight  t  or  less.  Moreover,  an 
incorrect  prediction  is  detected  immediately  in  the 
subsequent  validation  step,  so  the  entry  is  effectively 
calculated  by  trying  each  of  the  predictions  in  turn 
until  there  is  no  inconsistency. 
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The  Polynomial  of  correctable 
patterns  of  concatenated  codes 

Nicolas  Senclrier  ‘ 

Abstract 

The  polynomial  of  correctable  patterns  is  defined  in  [1]  as  the  weight 
enumerator  of  the  set  of  error  patterns  correctable  by  a  given  decoding 
algorithm.  The  polynomials  of  uncorrectable  and  miscorrected  patterns 
can  be  defined  as  well  ([5]  and  [6]). 

These  polynomials  allow  a  compact  representation  of  a  decoding  al¬ 
gorithm  which  is  sufficient  to  compute  the  correction  probability  and 
the  miscorrection  probability  through  a  memoryless  symetric  channel. 
These  results  are  generalised  in  [5]  and  [6]  for  erasure  channels. 

Our  purpose  here  is  to  compute  the  polynomials  of  correctable  pat¬ 
terns  of  concatenated  codes  for  different  decoding  algorithms. 

•  We  give  the  weight  distribution  of  the  error  patterns  correctable 
by  the  standard  decoding  algorithm. 

•  We  give  bounds  for  the  weight  distribution  of  the  error  patterns 
correctable  by  Block-Zyablov  algorithm. 

This  new  method  for  evaluating  concatenated  codes  will  thus  provide 
an  efficient  way  to  evaluate  the  standard  algorithm.  It  will  also  give  a 
way  to  evaluate  with  precision  the  performances  of  the  Block-Zyablov 
decoding  algorithm  which  needed,  up  to  now,  a  (much  more  expensive) 
simulation. 

As  an  example,  we  will  compute  the  decoding  performances  of  the 
concatenation  of  the  Nordstrom- Robinson  ([4,  p.  73])  inner  code,  and 
a  Reed-Solomon  (235,223,33)  outer  code  over  Fjse. 

1  Polynomials  of  correctable  patterns 

We  denote  by  F  the  finite  field  F,.  Let  C(n,  k,(l)  be  a  linear  code 
over  F  of  length  n,  dimension  k  and  minimum  distance  d. 

We  consider  a  transmission  channel  where  error  and  erasure 
may  occur  simultaneously,  we  represent  an  erasure  by  the  symbol 
oo  and  we  denote  by  F  the  set  F  U  {oo}.  Let  7  be  a  decoding 
algorithm  for  C  for  such  a  channel.  The  erasure  weight  p//(y)  of 
an  element  y  of  F"  is  the  number  of  its  component  equal  to  00, 
and  its  error  weight  is  equal  the  number  of  its  components 

different  from  0  and  00.  We  call  extended  weight  enumerator  of 
F  C  F"  the  polynomial 

The  polynomials  of  correctable,  uncorrectable  and  mis¬ 
corrected  patterns  of  7  [I,  .5]  are  respectively  the  extended 
weight  enumerators  of  {y  €  F",7(i/)  =  0},  {y  6  F’',7(y)  fails} 
and  {y  6  F",7(y)  €  C  \  {0}}.  They  are  denoted  /u(.V,  V' if), 
P,(X,Y,Z)  and  /’jf.V.T.Z). 

If  7  corrects  errors  alone  these  definitions  apply  with  T  =  0. 

Relationship  with  the  error  probability:  if  a  codeword  of  C 
is  transmitted  through  a  </-ary  symetric  erasure  channel  with  tran¬ 
sition  probability  pi  and  erasure  probability  poo  and  then  decoded 
by  7,  the  probability  of  correction  is: 

P ZOTT  —  PoiPz  POO^  1  ”  (*7  1  )P  “  Poo)' 

Similar  statements  hold  for  failure  probability  and  miscorrection 
probability  and  the  polynomials  P\  (A',  Y,  Z)  and  Pi{X,  Y,  Z). 

'INRIA,  Domainede  Volucrau,  Rocquencourt.  BP  105,  78153  LetOiesnay 
CEDEX,  FRANCE 


2  Concatenated  codes 

Let  B(n,k,d)  be  a  linear  (inner)  code  over  F,  and  let  C(N,K,D) 
be  a  linear  (outer)  code  over  Let  t  =  {d  —  l)/2. 

The  concatenated  code  of  B  and  C  denoted  BOC  is  the  .set  of  all 
the  codewords  of  C  whose  components  are  replaced  by  codewords 
of  B.  It  is  a  linear  code  over  F,  code  of  length  nW,  of  dimension 
kl\  and  minimum  distance  >  dD  [3].  A  codeword  can  be  seen  as 
a  succession  of  N  inner  codewords. 

Let  7  be  a  error  correcting  algorithm  for  B  and  let  Po(X,Z), 
P\(X,Z}  and  P2[X,Z)  be  its  polynomials  of  correctable,  uncor¬ 
rectable  and  miscorrected  patterns.  Let  ♦  be  an  error  and  erasure 
correcting  algorithm  for  C  and  let  Qo(X,Y,  Z),  Qi(X,Y,  Z)  ruid 
Q'i(X,Y,  Z)  be  its  polynomials  of  correctable,  uncorrectable  and 
miscorrected  patterns. 

The  standard  decoding  algorithm  F  consists  in  decoding  all 
N  inner  codewords,  then  the  outer  codeword.  Roughly  we  can 
denote  it  F  =  <l>  o  7^. 

Its  polynomial  of  correctable  patterns  is  equal  to 

Qo(p2(X,  K),  P,(X,  F),  Po(A',  F)). 

The  Block-Zyablov  algorithm  'P  will  use  t-1-1  inner  decoder, 
for  0  ^  ^  f,  7t  will  only  decode  error  patterns  of  weight  <  i 

(2j.  Roughly  we  can  define  <I'(y)  =  best.of  o<i<i(F,(y)),  where 
F,  =  <1>  o  7/''. 

Its  polynomial  of  correctable  patterns  verifies 

/io(X,r)r<  inJ^*'lQ(A,Fo . y,),  (1) 

(•0 . ».)€«•+•  >=0 

min  (s;)  <  D 
0<;<1  ■' 

with  Q(A',Fo . Y,)  = 

/  (  i-i  It  i-i  1 

[■£  PoAX,  T)  n  V;  +  n  +E  11  11 

\i=0  j=0  j=0  i=0  j=0  i=i  j 

where  Po,i(A',  K)  is  the  monomial  of  degree  i  of  Po(X,  F)  and 
Pj.ilA',  F)  is  the  weight  enumerator  of  the  miscorrected  error  pat¬ 
terns  y  €  F”  such  that  d//(y,7(y))  =  «• 

Notations  for  (1): 

(ni=o  Y;’\Q{.  . . )  is  the  coefficient  of  Fo'” . . .  F,*'  in  Q{X, ...). 
P{X,Y)  r<  g(A,F)  «.  V.,j,  [A'’F^]P(A',F)  <  lA'''F^]g{A,F) 
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Summary 

Shannon  showed  that  arbitrarily  low  error  (ALE)  can 
be  achieved  on  a  channel  when  the  information  rate  is  less 
than  the  channel  capacity.  Two  problems  emerge  in  this  re¬ 
spect.  The  first  is  concerned  with  the  development  of  codes 
which  can  achieve  ALE,  which  we  refer  to  as  an  error-free 
code[l].  The  second  is  concerned  with  the  design  of  prac¬ 
tical  bandlimited  efficient  systems  which  perform  with  low 
probability  of  error  while  retaining  a  decoder  which  is  of  rea¬ 
sonable  complexity.  It  was  shown  by  Feinstein  for  discrete 
memoryless  channels  (DMC)  that  the  average  probability  of 
error,  7^,  can  be  bounded  exponentially  in  the  length  of  the 
code,  n,  for  rates  r  less  than  capacity.  For  such  channels  we 
define  a  good  code  as  one  which  has  an  average  probability 
of  error  less  than  a  quantity  where  E{r)  is  a  func¬ 

tion  of  r  and  greater  than  0.  Note  that  an  error-free  code 
is  not  necessarily  good,  although  a  good  code  is  error-free. 

In  1953  Fano  showed  that  orthogonal  signaling  could 
be  used  to  assure  a  good  coding  system  on  the  infimte 
bandwidth  Gaussian  channel  for  rates  less  than  capacity. 
In  1954  Eliasfl]  presented  an  error-free  coding  system  for 
the  binary  symmetric  channel  (BSC).  In  1966  Schalkwijk 
and  Kailath[2]  presented  a  constructive  coding  scheme  for 
the  additive  Gaussian  noise  channel  (AGNC)  when  feed¬ 
back  is  allowed  for  rates  less  than  capacity.  Justesen(3)  in 
1972  found  constructive  concatenated  codes  for  the  BSC  in 
which  lim  inf(d/n)  >  0,  d  being  the  minimum  distance  of 
the  code.  It  can  be  shown  that  this  also  implies  the  codes 
are  good  codes  in  the  sense  defined  above.  In  1982  Delsarte 
and  Piret[4]  generalized  Justesen’s  construction  to  demon¬ 


strate  a  good  coding  system  for  rates  less  than  capacity  for 
a  class  of  channels  called  regular  DMC,  which  include  the 
class  of  symmetric  DMC.  In  this  paper  we  further  general¬ 
ize  the  Delsarte  and  Piret  construction  and  prove  that  it  is 
good  for  arbitrary  DMC  for  rates  less  than  capacity.  Other 
channels  for  which  the  construction  is  good  are  discussed. 
The  AGNC  without  feedback  is  examined.  It  is  shown  how 
to  construct  a  good  code  which  will  signal  with  any  rate 
less  than  j  log(l  +  ^),  where  the  average  input  power  S  is 
constrained  S  <  €  and  is  the  average  noise  power. 
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Abstract:  This  paper  proposes  and  investigates  a  method  of  designing 
pseudo-noise  (PN)  sequences  having  equivalently  j^d  properties  of  both  even 
and  odd  correlations,  i.e.  EOE  sequences,  which  are  important  for  acquisition 
and  demodulation  in  spread-spectrum  (SS)  communications.  Odd  correlation 
property  of  PN  sequences  should  be  designed  as  well  as  their  even  or  periodic 
correlation  property  so  that  interference  of  data-modulated  PN  sequences  can 
be  reduced  as  small  as  possible.  We  describe  a  method  of  designing  polyphase 
EOE  sequences  from  biphase  PN  sequences  with  a  good  aperiodic  correlation 
property  under  a  certain  condition.  Absolute  values  of  both  even  and  odd  cor¬ 
relation  functions  of  these  .sequences  at  each  shift  can  be  equal.  We  evaluate 
properties  of  the  derived  sequences  by  peaks  of  crosscorrelations  and  out-of¬ 
phase  autocorrelations  and  BER.  It  is  shown  that  the  polyphase  sequences 
have  lower  peaks  of  crosscorrelations  and  out  of  pha.se  autocorrelaticKis  and 
iotrer  BER  than  the  biphase  ones  by  numerical  evaluation.  Furthernxwe,  the 
generalized  odd  correlation  ftinrtion  which  is  important  in  polyphase  SS  sys¬ 
tems  is  defined .  and  a  method  to  improve  the  generalized  odd  correlation 
properties  are  derived. 


For  instance,  x  and  y  are  called  EOE-Gold  sequences  when  u  and  v  are 
Gold  sequences,  or  EO&Bent  sequences  when  u  and  v  are  Bent  sequences. 

Evaluation  of  Performance 

We  evaluate  correlation  and  BER  properties  of  EOE  sequences.  Table  1 
shows  the  distribution  of  the  maximum  sidelobe,  of  EOE-Gold  and 

Gold  sequences’  autocorrelation  functions.  Table  2  does  the  distribution  of 
the  maximum  values  of  crosscorrelation  functions,  $„{■,  •)  in  every  pair  of  the 
set  of  EOE-Gold  and  Gold  sequences. 

U  is  confirmed  that  BER  of  systems  using  EOE  sequences  is  improved  over 
that  of  systems  using  original  biphase  sequences  except  for  the  circumstance 
when  the  Ey/No  is  low. 

EOE  sequences  are  generally  useful  for  biphase  SS  systems  but  not  always 
useful  for  polyphase  SS  systems.  Moreover,  we  consider  modifying  EOE  se¬ 
quences  to  improve  the  performance  for  polyphase  SS  systera.  Hence  we  define 
the  generalized  odd  correlation  function  for  M-phase  SS  system  and  descibe 
a  method  of  improving  the  generalized  odd  correlation  function. 


Introduction 

In  asynchronous  code  division  multiple  access  (CDMA)  based  on  spread 
spectrum  techniques,  spreading  sequences  should  have  low  crosscorrelation  to 
reduce  co-cliannel  interference  and  increase  user  capacity,  as  well  as  sharp 
autocorrelation  to  accomplish  reliable  acquisition.  Moreover,  the  odd  corre¬ 
lation  function  represents  the  output  of  the  correlator  in  the  case  where  the 
data  symbol  changes  during  the  integration  of  the  correlation  operation  white 
the  even  correlation  function  represents  the  output  in  the  case  where  the  data 
symbol  remains  constant  over  two  consecutive  symbols.  Since  the  even  and 
odd  correlation  functions  appear  with  equal  probability,  both  functions  are  of 
equal  importance. 

This  paper  proposes  and  inve.stigates  a  method  of  designing  pseudo-noise 
(PN)  sequences  having  equivalently  good  properties  of  both  even  and  odd 
correlations.  Odd  correlation  property  of  PN  sequences  should  be  designed 
as  well  as  th  even  or  periodic  correlation  property  so  that  interference  of 
data-modulated  PN  sequences  can  be  reduced  as  small  as  possible. 

Definition  of  EOE  Seqxiences 
define  EOE  sequecne  as  follows. 

Let  EOE  sequecnes  be  the  sequences,  x  and  y,  having  even  and  odd  (auto 
and  cro.ss)  correlation  functions  whose  absolute  values  at  each  shift  are  equal. 
That  is. 

|(9(i'H/)l  =  |^(x)(/)|, 

l^(y)(l)l  =  l^(y)(/)!. 
mx.ym  =  \e{x,y){l)l 

for  every  /  €  {0. 1,  ■  ■  .  A'  -  1).  where  ^(r,y)(/).  ^(x,y)(/)  and  C(x,y)(/)  are 
the  even,  odd  and  aperidic  crosscorrelation  functions  of  sequences,  z  and  y, 
of  period  A’  respectively.  They  are  defined  by 


C{x,y}U)  = 


.Cix.y)il)  +  C(x,y)(l-N). 

(U 

:C{x.y){l)-C{x.y)(l-N). 

(2) 

S-l-l 

X!  ■'"Vn  +  f.  0  <  /  <  .V, 

MSO 

\  +  l-l 

(3) 

T  Jr„-,y’„.  \-N<l<0. 

11  =  0 

0.  |f|  >  .V. 

A  Method  to  Generate  EOE  Sequences 

Polyphase  PN  sequences  with  sucli  good  correlation  properties  can  be  de¬ 
rived  from  )>iphase  PN  sequences  with  a  good  aperiodic  correlation  property 
under  a  certain  condition.  We  propose  a  method  of  designing  Equivalent  Odd 
and  Even  correlation  (EOE)  sequences. 

Method  1  Lei  u  and  be  arbilmry  real  rained  seqveneea  ofpertod  N.  Then, 
ihe  compiet  valued  sequences,  r  and  y,  gtven  by 


■/ 

T„  =  M»iexpj(-J^  +4). 

(n  =  0.1,.. 

..Af-l) 

{*) 

..xkn 

yn  =  f'«expjf  —  +  3). 

(n  =0.1... 

..A"  -  1) 

(5) 

an  EOE  ^eqncenes  when  k  is  on  arbilmry  odd  integer  and  0,  an  er6ifrBry  real 
con$1an1  nahsfytng  {0  <  0  <  2?r). 
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Summary 

An  n  xn  Costas  array  is  an  n  xn  array  of  blanks  and  dots  with 
exactly  one  dot  in  each  row  and  column  and  with  an  optimum  two- 
dimensional  aperiodic  autocorrelation  function.  In  other  words,  if 
the  Costas  array  is  shifted  vertically  and/or  horizontally  (without 
wraparound)  and  then  compared  to  a  fixed  copy  of  itself,  at  most 
a  single  pair  of  dots  overlap.  We  denote  the  Costas  array  by  the 
Msociated  permutation  (ro, . . .  ,r„_i)  of  n  elements  where  there 
is  a  dot  in  position  (i,r,).  Costas  arrays  are  used  in  a  variety  of 
ranging  and  synchronization  applications  [1],  [2]. 

Costas  arrays  exist  for  arbitrarily  large  n  since  there  are  con¬ 
structions  for  n  =  p  —  1  and  n  =  q  —  2,  where  p  is  prime  and  q 
is  a  power  of  a  prime  [3].  They  are  conjectured  to  exist  for  all 
values  of  n.  The  smallest  value  of  n  for  which  there  are  no  known 
Costas  arrays  is  32.  This  case  is  too  large  to  search  exhaustively 
with  today’s  algorithms  and  computer  technology.  We  present 
the  enumeration  of  Costas  arrays  through  n  =  32  that  satisfy, 
respectively,  three  different  kinds  uf  symmetries.  Our  aim  wtis  to 
discover  a  32  x  32  Costas  array  or  gather  new  evidence  for  its 
possible  nonexistence. 

The  three  types  of  symmetries  considered  are: 

1.  Diagonal:  if  there  is  a  dot  in  position  (t,ri),  then  there  is 
a  dot  in  position  (r,,t).  For  example,  (0,4, 6, 3,1, 7,2,5). 
Costas  arrays  obtained  from  the  Lempel  or  Golomb  construc¬ 
tion  [3]  have  diagonal  symmetry. 

2.  .Anti-reflective:  for  n  even,  r,  -h  ri.f.„j2  =  n  —  1.  For  example, 
(3,0,5,6,4.  7. 2, 1 ).  Costas  arrays  obtained  from  the  Welch 
construction  [3]  have  anti-reflective  symmetry. 

3.  Consecutive:  for  n  even,  r,  and  are  consecutive.  For 

example,  (6, 3. 5, 0, 1 , 4, 2, 7). 

Efficient  search  algorithms  that  take  advantage  of  the  as¬ 
sumed  symmetry  were  developed  and  implemented  using  parallel 
computer  processing.  The  enumeration  through  n  =  32  was  com¬ 
pleted  for  each  type  of  symmetry.  Figure  1  lists  the  number  of 
equivalence  classes  of  Costas  arrays  found  for  each  case.  There 
are  no  diagonal  symmetric  Costas  arrays  for  n  =  24,  28,  31,  and 
32.  This  enumeration  extends  the  enumeration  described  in  |4) 
from  size  22.  There  are  no  anti-reflective  symmetric  arrays  for 
n  =  24,  26,  and  32.  There  are  no  consecutive  symmetric  arrays 
for  even  n  with  22  <  n  <  32. 

All  Costas  arrays  have  been  enumerated  previously  through 
size  n  =  20.  They  have  been  decreasing  in  number  since  n  =  17. 
This  fact  and  the  above  symmetric  results  contribut'*  to  a  growing 
sense  that  there  may  not  be  a  Costas  array  of  size  32.  We  have 
extended  this  enumeration  to  n  =  21:  there  are  3536  costas  arrays 
of  size  21. 

We  have  constructed  a  database  of  all  Costas  arrays  up  to 
size  n  =  21  and  have  used  this  database  to  analyze  the  underlying 
structure  of  the  arrays.  Such  an  analysis  could  be  used  to  expe¬ 
dite  the  enumeration  for  higher  n,  to  find  new  constructions,  and 
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MITRE  Sponsored  Rei.earch  Program. 
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Office  Cornell  Mathematical  Sciences  Institute,  by  the  Office  of  hfavaf  Research 
under  grant  number  N000I4-Sn-J-|30I.  and  by  the  NSF-EPSCoR  of  Puerto  Rico 
Project. 
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possibly  to  suggest  an  approach  for  proving  the  nonexistence  of  a 
32  X  32  Costas  array.  As  one  example,  we  measured  the  size  of  the 
largest  prefix  that  adjacent  arrays  had  when  the  Costas  arrays  are 
ordered  lexicographically.  By  n  =  20,  the  largest  common  prefix 
size  is  only  6.  Some  of  the  other  properties  investigated  include 
the  distribution  of  the  dots  of  the  arrays  (there  is  a  tendency  for 
the  dots  to  form  an  annulus)  and  the  number  of  dots  by  quad¬ 
rants  (there  is  a  strong  tendency  of  the  dots  to  be  distributed 
equally  in  the  four  quadrants  when  n  is  even).  We  also  investi¬ 
gated  the  cycle  structure  and  ascent  properties  of  Costas  arrays 
in  comparison  with  random  permutations. 
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Abstract 

We  will  study  the  distribution  of  longest  upward  and  downward 
monotonic  subs^uences  contained  in  sequences,  or  permutations  of 
1, 2,  . . ,  n.  Thereby,  a  famous  Ramsey-type  existance  theorem  of  se¬ 
quence  having  a  specific  property  is  refined  by  the  precise  counting 
technique. 

Summary 

A  subsequence  of  the  permutation  iritrj . . .  Xn  of  {1, . . . ,  n}  is  a  se¬ 
quence  considered  in  the  same  order  as  the  numbers  appear  in  the 
permutation.  For  example,  142  is  a  subsequence  of  permutation 
7134652,  but  753  is  not.  Given  any  permutation,  a  subsequence 
irj,5rj,  . . .  Xjj  (ti  <  12  <  . . .  <  tn)  is  upward  monotonic  if  it  is  ^ways 
increasing,  that  is,  Xj,  <  jTj,  <  ...  <  Similarly,  a  subsequence  is 
downward  monc^onic  if  it  is  always  decreasing.  Then,  we  are  interested 
in  the  longest  length  pair  (d,  u)  of  downward  and  upward  monotonic 
subsequences  contained  in  given  permutation  of  order  n.  Our  main 
concern  is  the  determination  of  the  number  of  permutations 

of  order  n  having  the  pair  (d,  u)  for  any  1  <  d,u  <n. 

This  problem  contains  the  famous  theorem  of  Ramsey  type  as  a  spe¬ 
cial  case.  That  theorem  due  to  Erdos  and  Szekeres  [lj,[2|  states  that 
any  sequence  of  distinct  n*  -(- 1  numbers  containes  a  monotone  (down- 
word  or  upward)  subsequence  of  length  n  -(-  1.  This  theorem  can  be 
expressed  in  our  notation  as 

\\/n]  =  min  max(d,u} 

'  c(")(d,u)>0 

We  can  give  the  number  c(")(d,u)  by  making  full  use  of  the  pro|>- 
erties  of  Pascal  triangle  of  binary  coefficients  and  their  combinatorial 
meanings.  For  example,  we  obtain  the  table  of  the  number  c^''\d,u) 
for  n  =  10  as  depicted  in  the  following: 


To  reveal  the  riddle  of  the  numbers  in  such  tables,  we  must  prepare 
some  interesting  recursion  formula  for  auxiliary  tables  induced  fiom 
Pascal  triangle.  As  byproducts  we  have  some  formulas  such  as 


c*"^(d,n  -  d-l- 1)  = 


=  (-)■ 


/c(’')(d,2)  = 


for  d  <  rn/2l  or  d  =  n 
/c("-')(d-  1,2)  -I-  v/c<"-')(d,2)  for  rn/2]  <d<n-l 
/c("~')(n  -  2,  2)  -I-  1  for  d  =  n  -  1 
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hitroduction 

Thia  paper  ia  concerned  with  conatructioa  of  new 
famiUea  of  biphaae  ae<iaencea  which  are  (drained  throng  a 
polynomial  mapping  from  the  ring  Z4  to  QF(2).  Such  aeciuepcea 
are  of  intereat  m  code  divirion  apcend  ap^nim  muHi-twer 
communication  qratema. 

Optimal  and  auboptimal  fiuniUea  of  quadrii^iaae 
aequencea  derived  from  maximal  length  aequencea 
fm-aequencea)  and  interleaved  mMtimel  length  aequencee 
(im-aequencea)  over  Z4  ate  given  in  [1,2].  The  period  of 
m-aejquencea  ia  21^1  and  that  of  Im  aequencea  ia  2(^—1).  The 
familiea  of  2^+1  biphaae  aequencea  of  period  2^1  derived  from 
familiea  of  m-eemiencea  over  Z4  are  optimal  like  Clold  {amihea 
and  are  reported  in  [6].  Linear  conq>lexity  (LC)  of  theae 
aequencea  ia  lower  bounded  by  r(i^l)/2. 

In  thia  paper,  we  derive  Ixphaae  familiea  of  aiae  2^^■l,  r 
a  poaitive  int^er,  from  familiea  of  Im-aequencea  over  Z4.  Moat 

of  the  familiea  aatia^  Sdahukov  bound  on  lUu 
where  L  ia  ^ual  to  the  period  of  the  aequene^  which  ia  equal 
to  the  maximum  magnitude  of  the  periodic  croaacorrelation 
and  out  of  phaae  aaitocmrelation  valuao.  One  of  the  familiea 

aatiafiea  Welch  bound  mi  iata  (fLa>  <  while  reat  of  the 

funihaa  are  aubmitimal  (tmm  i>  bounded  by  2/L  ).  The  linear 
complexity  of  all  aamiencaa  in  equal  to  r(^3)/2  with  the 
exc^ition  of  the  aii^e  m-aequence.  Sequence  imbalance  and 
correlation  diatributianB  are  alao  computed. 

Main  Results 

We  conaider  a  non-linear  polynomial  {Mff)  mapping 
from  Z4  to  ita  ideal  <  2  >,  ffvm  by  ^x)  =  x*-x.  The  ideal 
<  2  >  ia  iaomorphic  to  the  binary  neld  and  the  quadriphaae 

mapping  givm  by  d(x)  =  <^i  where  w  =  on  the  aequencee 
over  the  ideal  rerata  in  biphaae  aeoumcea.  Tfana  biphaae 
familiea  are conatructed from tamiliea  of  Ziaequmcea  by  uring 
the  MCP  mapping  p(x)  given  above.  The  franilka  of  Z4 
aequencea  conaidMed  for  nphaae  aequence  conatruction  in  thia 
pa^  are  the  familiea  of  Im-aequencea  over  Z4(or  aimply  XM 
fHniliea),  of  period  2(2^1).  Earn  funily  conaiata  of  (2^'i-l) 
aequencea  [1,2].  The  definition  of  XM  fiuniliea  dec  ~ 'da  on  the 
atructure  at  the  (%doia  extenaion  ring  of  Z4,  Q&4j),  r  a 
poritive  integn.  Any  group  of  unita  Qll*f4,r)  of  a  Qaloia  ring 
QR(4,rt  ia  a  direct  proiduct  of  two  groupa  Oa  and  Qc,  where  Q» 
ia  Abafian  groim  of  order  V,  and  Oe  w  the  cyclic  componmt 

Eof  order  ^1.  Aaaodated  with  every  elemmt  7  of  Qa  € 
l,r),  7  #  1,  a  JJt funity  ia  defined.  Three  claaa«  of  XM 
m  are  idratified  depending  on  the  nature  of  7.  They  are 

(a)  XjIP  fuailka  with  traee(7)  w  1 

(b)  J^familiaa  with  trace(7)  a  0,  7#  1 

(c)  JCd^  family 

wlwre  7*1  +  2(7),  7*7'  mod  2,  7*  6  0^.  A  paper  by  Boataa, 

Bammona  and  Kumar  [4]  containa  aome  of  the  Z4  familiea 
givm  above.  The  fMnihaa  givm  in  [4]  correipond,  aa  per  the 

above  claarification,  to  XJP  with  tr(7)  s  1.  Tha  X/i^ 

fuuliaa  with  tT(7)  >  0  tod  the  JCd?  family  are  additional 

aub"4amiliee  among  JM  funiliaa  of  period  2(2'^-l).  Thaae 
familieo  through  and  quadriidiaoo  mappinp  viald 

biphaae  fmiiliea.  The  biphaae  nmiliea  thna  conatructed  are 
nmned  by  prefixiiigthe  arord  JfJtP  to  their  correaponding  Z4 
famihea.  The  mapping  conaidered  in  thia  paper  ia 


emiivalent  to  quartemary  to  bina^  tranafoimation  ghrm  in 
[SQ.  The  corrdation  propcrtiea  of  binary  famihea  are  camnitad 
thoae  of  Z4  froniliea  by  nautf  a  method  given  in  [^.  The 
bounda  on  the  of  binary  famuiea  givm  [3]  are  improm  by 

making  uae  of  pacific  corraation  propertieo  of  XJt  froniliee. 
The  corrriation  diatributiona  and  aequence  imbalanre  are 
confuted  by  making  uae  of  the  propertiea  of  Qaloia  ring 
QIl(4,r).The  corrriation  {wtpertiea  of  biidiaae  familiea  derived 
are  tabulated  in  TaUe  1.  The  LC  of  reauhant  binary  aequencea 
ia  computed  throu^  the  LC  analyaia  of  correywiding  binary 
ideal  aequmcea  ov«  Z4.  A  generabaed  verrion  of  Blahut'a 
theorem,  which  relatea  U?  of  a  aequence  over  Z4  to  the  number 
of  non-amo  poaitiona  in  ita  Fourier  tranaform,  ia  uaed  to 
compute  the  LC  of  ideal  aeaumcea.  Since  thia  ideal  ia 
iaomorphic  to  the  binary  field,  LC  of  binarv  ideal  aequencea 
thua  computed  via  Blahut'a  thoorem  ia  indeed  LC  of  the  binary 
aequencea.  The  LC  of  all  aequmcea  derived  from  •XB&miliaa  ia 
equal  to  r(r-|-3)/2  with  aa  exception  of  the  ain^  m-aequence. 


ThblelNowI 
Family  Ske:  2*“*+l;  LC:  r(r+3)/2;  Period;  2(2*-l) 


Family 

r 

CMnmmt 

(t**) 

JTJX-XJP 

odd 

ev«n 

2(1+2^'"*^’)) 

2(1+2^'/*)) 

Optunal 

(W^ 

Optimal 

t^7)=l 

jtjj-jjP 

(SkMnikov) 

odd 

2(1+2^*+*/’)) 

Sub  optimal 

tr(7)=0 

(Sideliiikov) 

tr(7)=l 

jrjtx-jjP 

even 

2(i+2(«+1/2)) 

Sub  optima] 

tr(7)=0 

(Siddnikov) 
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Abstract 

Given  positive  integers  r,s,u  and  v,  an  (r,s;u,v) 
Perfect  Map  (PM)  is  defined  to  be  a  periodic  r  x  s 
binary  array  in  which  every  u  x  v  binary  array 
appears  exactly  once  as  a  subarray.  Perfect  Maps 
are  the  natural  extention  of  the  de  Bruijn  sequences 
to  two  dimensions. 

In  this  paper  we  settle  the  existence  question  for 
Perfect  Maps  by  proving  the  following  result. 

Let  r,s,u,v  be  positive  integers.  Then  there  exists 
an  (r,s;u,v)  PH  if  and  only  if  the  following  three 
conditions  hold: 

i)  r.  - 

ii)  r>uorr”U«), 

iii)  8>vors»u”  1. 

Me  make  extensive  use  of  previously  known  con¬ 
structions  by  finding  new  conditions  guaranteeing 
their  repeated  application.  These  conditions  are 
expressed  as  bounds  on  the  linear  complexities  of 
the  periodic  sequences  forswd  from  the  rows  and 
columns  of  Perfect  Maps. 
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Abstract 

A  “radar  array”  is  a  matrix  of  xeros  and  ones  which  has 
small  one-dimensional  autocorrection  sidelobes.  New  general 
constructions  and  new  upper  bounds  for  the  site  of  radar  ar¬ 
rays  are  presented. 


Summary 


A  radar  array  R  is  an  x  Af  matrix  of  ones  and  zeros  with  a 
single  one  per  column,  such  that  the  horizontal  autocorrelation 
function  only  has  values  0,1,..., h,  and  M,  where  k  is  the 
maximal  allowable  sidelobe  (generalized  from  [2]  and  [3]).  We 
say  a  radar  array  is  optimal  if  it  has  the  maximum  M  for 
given  values  of  N  and  k.  Denote  that  maximum  number  as 
GRk(N).  Previously  the  best  known  asymptotic  upper  and 
lower  bounds  for  GRi(N)/N  have  a  gap  of  about  0.463  ([4]). 
This  gap  is  shrunk  to  about  0.101  in  this  paper. 

We  applied  the  Erdos-Turan  ([!])  argument  to  obtain  the 
upper  bounds.  Suppose  a  window  of  width  K  {N  <  K  <  M) 
is  superimposed  on  the  radar  array.  Let’s  slide  the  window 
from  the  left  end  of  the  array  to  the  tight  end  and  count  the 
number  of  spacings  (the  distance  between  any  two  I’s  in  the 
same  row)  within  the  windows.  By  estimating  the  minimum 
number  of  spacings  in  each  window  and  considering  how  many 
windows  contain  a  particular  spacing,  we  have  the  following 
theorem.  (For  details,  see  [5].) 

Theorem  l!  GRk(N)  <  minigiil  +  Fotll.  min  ff3(p)), 
where 

/,,,!.  6^  2k -hp,,  X-h 

Pt  =  yi  +  6k- gi{p)=  ^  N;  jj(p)  = - p - 

X  =  2(5)iV  +  *pAf-fc;  V'  =  2(p-1) 

Z  =  AT*  -  4k  -  2k’pW  +  k’ 


2.  Substitute  any  zero  in  the  rows  with  an  L  x  L  all  zero 
matrix. 

3.  Substitute  the  first  1  in  each  row  with  A. 

4.  Substitute  the  second  1  in  each  row  with  B. 

5.  Substitute  the  third  (if  any)  1  in  each  row  with  C. 

For  a  proof  of  this  construction,  see  [5].  The  best  achiev¬ 
able  GRk(NyN  ratios  by  using  this  method  is  306/113  when 
k  =  1  and  4  when  k  =  2.  Combined  with  the  result  of  The- 
orem  1,  we  have  2.708  <  ^mH—aoGRi(N)/N  <  2.809  and 
4  <  limjv-o.GRz(NyN  <  4  276. 
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IVbl*  1:  Sem*  Upper  and  Lower  Bounds  for  the  t  s  1  Ce«e 


L  s  5 

A  =  (1,2, 3, 4, 5]' 

B  =  (l,3,5,2,4r 

C  =  (2, 1,5, 4, 3)7' 

D  =  [3,l,4,2,5f 

L  =  7 

A  =  fl,2,3,4,5;6,7J' 

B  =  (1,3,5,7,2,4,6J7’ 

C  =  l3,l,6,4,2,7,5f 

L  s  13 

A  =  [1,2,3,4,5,6,7,8,9,10,11,12,13]' 

B  =  [1,3, 2, 7,11, 10, 4, 13,5, 12,9,6,8)7' 

C  =  [5,3,8,2,7,12,1,11,4,6,10,9,1317' 

TUble  2:  Some  Permutation  Mstrice*  with  Pairwise  Properly  Centered 
Sets  of  Difference* 


New  heuristic  methods  are  used  to  search  for  radar  arrays 
with  smaller  sizes.  Table  1  summarizes  new  upper  and  lower 
bounds  for  the  k  =  1  case.  Next  we  introduce  a  new  con¬ 
struction  for  the  radar  arrays  which  gives  asymptotic  lower 
bounds. 

Define  two  LxL(L  odd)  permutation  matrices  A, B  to  be 
“properlff  centered”  if  their  row-by-row  differences  range  ex¬ 
actly  from  —  to  Table  2  shows  some  examples  of 
such  permutation  matrices.  (The  existence  of  such  permuta¬ 
tion  matrices  of  other  odd  sizes  is  still  unknown.  It  would  also 
be  useful  if  one  can  find  more  than  4  such  matrices.) 

Given  any  k  m  1,  size  N  x  if  radar  array  with  at  most 
three  I's  in  each  row  (such  radar  arrays  exist  for  all  small  N), 
a  new  k  s  1  radar  array  of  size  NL  x  ML  can  be  constructed 
according  to  the  following  rules. 

1.  Choose  a  set  of  A,  B,  C  from  Table  2. 

“This  rssearcb  is  supported  in  partby  NSF  under  Grant  NCR- 
8905052 


[1]  H.  Halberstam  and  K.F.  Roth,  Sequences,  vol.  I.  Oxford; 

Clarendon,  1966. 

[2]  S.W.  Golomb  and  H.  Taylor,  “Two-dimensional  s3rnchro- 

nization  patterns  for  minimum  ambiguity,”  IEEE  T>ans. 
Inform.  Theory,  vol.  IT-28,  no.  4,  pp.  600-604,  July 
1982. 

[3]  J.P.  Robinson,  “Golomb  Rectangles,”  IRF.F.  Tysn*. 

Inform.  Theory,  vcd.  IT-31,  no.  6,  pp.  781-787,  Nov. 
1985. 

[4]  A.  Blokhuia  and  H.J.  Tiersma,  “Bounds  for  the  size  of 

radar  arrays,”  IEEE  Tkans.  Inform.  Theory,  vol.  34,  no. 
1,  pp.  164-167,  Jan.  1988. 

[5]  Z.  Zhang  and  C.  Tb,  “New  bounds  for  the  size  of  radar 

arrays,”  submitted  to  IEEE  'IVans.  Inform.  Theory. 


409 


FAMILIES  OF  FOUR-PHASE 
QUASI-ORTHOGONAL  CODE  ARRAYS 

Serdar  Bozta§ 

Department  of  Electrical  and  Computer  Systems  Engineering 

Monash  University,  Clayton,  Victoria  3168,  Australia 

Abstract 

A  method  of  construction  is  presented  for  rectangular  quasi- 
orthogonal  code  arrays  over  the  ring  Z\. 

The  proposed  arrays  are  easy  to  generate:  The  four-phase  lin¬ 
ear  recurring  sequences  constructed  by  Boztag  et.  al.  are  utilized 
to  generate  the  arrays  by  modulo  4  subtraction.  The  periodic  auto- 
and  cross-correlation  properties  of  these  arrays  are  then  derived  in 
a  straightforward  manner.  The  maximum  off-peak  correlation  mag¬ 
nitude  for  these  arrays  is  lower  by  a  factor  of  spi  when  compared 
to  the  binary  Gold  code  arrays  constructed  by  Kuo  and  Rigas. 

The  arrays  can  be  used  for  ’add-on’  data  transmission,  pattern 
synchronization,  and  code  division  image  multiplexing. 

INTRODUCTION 

Kuo  and  Rigas  introduced  binary  quasi  m-arrays  and  quasi  Gold 
arrays  in  [1].  These  arrays  were  proposed  to  overcome  some  disad¬ 
vantages  associated  with  the  m-arrays  studied  by  Nomura  et.  al. 
[2]  and  MacWilliams  and  Sloane  [3].  Their  construction  for  quasi 
m-arrays  yields  Li  x  Lq  binary  arrays  where  Li  =  2”’  -  1,  with  n,- 
positive  integers  for  i  =  1,2. 

Given  two  binary  sequences,  say  si(t),  t  =  0, 1, . . ., -  1  and 
sj{t),  t  =  0, 1, . . .,  Irj  -  1  (these  sequences  can  either  be  two  m- 
sequences  or  two  Gold  sequences)  Kuo  and  Rigas  use  the  construc¬ 
tion  a(ti,(j)  =  Si(ti)  $  Sj(tj)  where  ®  denotes  modulo  2  addition. 

The  maximum  off-peak  auto-correlation  for  the  quasi  m-arrays 
is  given  by  nuu{L\,Li}  while  the  maximum  cross- correlation  mag¬ 
nitude  depends  on  the  choice  of  the  m-sequences.  This  is  one  rea¬ 
son  for  the  introduction  of  Gold  code  arrays  in  [1].  The  maximum 
off-peak  auto-  and  cross-correlation  magnitude  for  the  Gold  code 
arrays  is  given  by  ©mai  =  mor  {L\  •(V2f-j-h  1) ,  Lz  •  (%/5IT-i- 1)1. 

THE  NEW  CONSTRUCTION 

In  this  paper  four-phase  quasi  orthogonal  arrays  are  introduced. 
The  method  used  for  the  construction  of  these  arrays  from  four- 
phase  sequences  is  similar  to  the  method  used  in  [1]. 

Bosta;,  Hammons  and  Kumar  (4)  constructed  families  of  four- 
phase  linear  recurring  sequences  with  near  optimum  correlation 
properties.  These  sequences  are  used  here  to  construct  new  families 
of  four-phase  quasi-orthogonal  code  arrays.  The  reader  is  referred 
to  [4]  for  a  tabulation  of  generating  polynomials  (hence  recursion 
coefficients)  for  these  sequences.  The  sequences  that  are  used  in  the 
construction  here  are  referred  to  as  the  family  A  in  that  paper  and 
are  defined  as  all  the  nonzero  sequences  satisfying  a  given  linear 
recursion  over  Z4. 

Given  two  four-phase  sequences  (say  si(t),  t  =  0,1,. ..,£1  —  1 
and  S3(t),  t  =  0, 1, . . . ,  1-2  - 1,  where  1;  =  2"'  - 1,  with  a  positive 
integer  for  i  =  1,2)  belonging  to  family  A  the  four-phase  quasi 
orthogonal  suray  a{U,tj)  of  size  ij  x  Lz  can  be  constructed  by 

a{ti,tz)  =  si(tz)  Q  sz(tz)  (1) 

where  0  denotes  modulo  4  subtraction. 

Definition  1  The  cross-correlation  between  two  four-phase  arrays 
a(<i>fj)  otid  b{t\,tz)  of  the  same  dimensions  Li  x  Lz  is  given  by 

©.6(n.T-z)=  52  V  (2) 

(}S0  <2*0 

where  0  <  n  <  Li,  and  the  sums  L  -k  r*  are  interpreted  modulo  Li, 
for  i  =  1,2,  and  w  is  defined  as 


Theorem  1  (a)  The  cross-correlation  between  two  four-phase 
quasi-orthogonal  arrays  satisfies 

I  ea6(ri,r2)  |<fl„„(L,W„„(I,2),  (3) 

where  0max(L)  is  the  maximum  off-peak  auto-  and  cross-correlation 
magnitude  of  the  four-phase  sequences  in  family  A  of  length  L. 

(b)  The  auto-correlation  of  a  four-phase  quasi-orthogonal  array  sat¬ 
isfies 

I  ©a»(n.T-2)  |<  max{£lfi„„(I'3).  Lz»maz(Ll)}.  (4) 

Proof  The  proof  is  straightforward.  Denote  the  two  sequences  in 
family  A  used  to  generate  b(ti,tz)  by  sj  and  s'z,  i.e., 

=  *i(li)  S  Sz(tz)  and  a(ti,t2)  =  si(fi)  6  S2(t2). 
Substituting  this  in 

©■.t(T-i,r2)  =  52  52 (5) 
<1=0  <2=0 

gives 

Li-l  1/2-1 

Bafr(n*l’2)=  ^  (6) 

<1=0  <2=0 

or 

1/2“! 

Bai,(rurz)=  5Z  ^  (yj 

<1=0  <2=0 

Note  that  the  right  hand  side  is  just  a  product  of  two  cross¬ 
correlations  between  pairs  of  sequences  from  family  A.  Case  (a) 
follows  directly  from  this. 

In  case  (b),  Sj  =  s',  and  S2  =  s'2  as  sequences  and  therefore  when 
either  r,  =  0  or  T2  =  0,  the  corresponding  sum  yields  L\  or  Lz 
respectively  which  proves  the  claim  for  auto-correlation.  □ 

The  theorem  yields  the  immediate  corollary  below. 

Corollary  2  ©mor  =  max 

{Li  ■  (\/Lz  +  1  -I-  1)  ,  Lz-  WLi  -b  1  -)-  1 )}.  for  the  quasi-orthogonal 
four-phase  sequences  constructed  here,  and  is  lower  than  that  of 
the  Gold  code  arrays  in  [})  by  a  factor  of  w  y/2. 

Proof  9maz(L)  =1-1-  V£  -h  1  for  family  A  (see  [4])  while  it  is 
1  -h  v/2l  for  Gold  sequences.  □ 
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We  consider  p-nary  GMW-sequences  of  length  —  1 
which  are  defined  as 


s(n)  =  exp  with  a(n)  =  trf(try( a 

^  (1) 
and  some  restrictions  on  the  parameters  J,d  and  r  (see 
[1,  2]).  tr(  )  denotes  the  trace  function  from  the  finite 
field  GF(p^)  onto  GF(p'^),  and  a  is  a  primitive  ele¬ 
ment  of  GF(p*^).  The  periodic  crosscorrelation  function 
(PCF)  of  s  and  g  with  ff(n)  =  exp  tr f  (tr^  (a*")* 

is  defined  by 


p“-2 

^  ^•(n)g(n  +  k), 
n=0 


where  n+ib  is  taken  modulo  p*^  - 1.  The  crosscorrelation 
function  depends  on  the  parameters  d,  r,  e  and  s.  There¬ 
fore  we  write  <Psf(k)  =  ^d.r,e,s(^)-  The  periodic  cross¬ 
correlation  function  of  two  p-nary  m-sequences  becomes 
in  this  notation  because  s  and  g  are  equal  to 

m-sequences  for  r  =  s  =  1. 

The  paper  aims  at  the  calculation  of  the  correlation  func¬ 
tion  of  GMW-sequences  by  reducing  it  to  the  PCF  of 
ordinary  m-sequences,  because  results  on  the  crosscorre¬ 
lation  functions  of  m-sequences  are  well  known  (see  [3]  for 
a  compressed  description  for  p  =  2  and  (4,  5]  for  p  >  2). 
One  possible  way  is  the  description  by  the  crosscorrela¬ 
tion  function  of  shorter  m-sequences  with  length  p'^  —  1 . 
This  was  done  in  the  papers  [6,  7]  for  d  =  e  and  r  =  1,  so 
that  the  crosscorrelation  of  an  m-sequence  (r  =  1)  and  a 
GMW-sequence  with  'same  primitive  polynomiar-(d  =  e) 
is  known  up  to  now.  We  generalize  this  result  to  the  case 
r  ^  1,  so  that  for  the  first  time  the  crosscorrelation  of 
two  GMW  sequences  was  investigated; 

Theorem  1  The  crosscorrelation  for  d  =  e  =  I  is 

•fil.r.l.sik)  = 


f  p^--’{(Pr,,{k/T)  +  1)  -  1,  fork  =  0modT 

l-l,  else, 


where  ,  denotes  the  crosscorrelation  function  of  the  m- 
sequences  exp(j2ir  tr/(7'"’)/p)  and  exp(j2)r  trf  (7*")/p) 
of  length  p-’  -  I  (7  =  =  (p"  -  l)/(p-'  -  ijj. 


*Mr.  Antweiler  in  now  with  CADIS,  Kaitentr.  100,  D-5I20 
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Another  way  is  the  reduction  of  the  PCF  of  GMW- 
sequences  to  the  PCF  of  m-sequences  with  the  same 
length:  ipd,T,t,t{k)  =  ifid,\,t.\{k),  whereby  restrictions  has 
to  be  fullfilled  by  the  parameters  d,r,e  and  s.  For  two 
cases  we  found  a  description  of  the  PCF  of  GMW  se¬ 
quences  in  this  form; 

Theorem  2  The  crosscorrelation  function  for  r  =  s  and 
d  =  cp*  mod  p"^  —  1  is 

<Pd,r,e,r(k)  =  ^d,l,e,l(k). 

This  theorem  allows  the  calculation  of  PCF  of  GMW 
sequences  having  the  same  linear  span,  because  the  linear 
span  depends  only  on  r  and  s.  (The  linear  span  is  the 
minimal  degree  of  a  linear  recursion  satisfied  by  a(n)  in 
eq.(l)). 

Theorem  3  The  crosscorrelation  function  for  d  = 
— esp*  mod  p"^  —  1  and  rs  —  p*  mod  p’’  —  1  is 

'Pd,T,t,t(k)  = 

The  meaning  of  the  condition  rs  =  p'  or  r  =  p's“*  is  that 
the  nonlinear  mappings  which  are  performed  by  raising 
to  the  rth  and  sth  power  are  inverse  to  one  another. 
With  known  results  on  the  PCF  of  m-sequences  these 
three  theorems  allow  the  calculation  of  the  PCF  of  GMW 
sequences  for  many  cases. 
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ABSTRACT 

We  propose  two  families  of  complex  sequences  with  components  on 
the  unit  circle  (PSK  sequences).  Each  sequence  of  a  family  has 
the  perfect  auto-correlation,  i.e.,  all  ” out-of-phase"  correlation  co¬ 
efficients  are  equal  to  zero.  Magnitudes  of  all  cross-correlation  coef¬ 
ficients  of  any  couple  of  sequences  in  a  family  are  equal  to  the  square 
root  of  a  sequence  length  n.  Thus  both  families  are  asymptotically 
optimal  with  respect  to  the  Sidelnikov- Welch's  lower  bound. 

1.  SUMMARY 

Let  Af  =  {X(m)  =  (zo(m),Z](m) . Zn-i  (m)),  m  =  1,2, . . .  ,M} 

denote  a  family  of  complex  sequences  of  length  n.  Let 

=  fi.(.)  =  ^x.(.)z;+,(.),  o) 

s»0 

FI  — 1 

andH.(.',i)  =  .  .•  7=  #  j, 

•sO 

denote  the  periodic  auto-  and  cross-correlation  coefficients,  respec¬ 
tively.  z*  denote  the  complex  conjugate  of  z,  subscripts  are  calcu¬ 
lated  modulo  n. 


Proof.  If  an  integer  t  has  a  representation  t  =  ap*  +  6,  then  an 
integer  s  + 1  has  a  representation  s  +  t  =  (u  +  a  +  <)p*  +  (v-f  6  —  «p*), 
where  <  =  0,  if  v  -i-  6  <  p*  —  1 ,  and  f  =  l.ifv-b6>p*.  Thus 

The  inner  sum  in  (7)  is  equal  to  0,  if  (i  —  })v  —  jb  ^  0,  and  is  equal 
to  p*  =  >/n,  if  (t  —  j)v  —  jb  =  0.  This  equation  has  a  unique  solution 
vj  =  (»  —  j)~*  jb,  since  •  —  j  ^  0  modulo  p.  Thus 

R,(i,j)  =  (8) 

Corollary;  If  n  =  p^  then  there  exists  a  family  A4i  of  size  M  = 
—  1  with  near-optimum  cross-correlation  y/n.  It  is  comparable 
with  parameters  of  the  Kasami  family,  but  the  auto-correlation  is 
perfect. 

Now  we  describe  sequences  of  length  n  =  p^*+'  with  the  perfect 


auto-correlation.  Every  integer  s,0  <  s  <  n  —  1,  can  be  represented 
uniquely  in  a  fonn  s  =  up*+’  +  vp*  +  c,  where  0<u<p*.0<u<p. 
0  <  c  <  p*.  Let  for  any  c  a  sequence  (zv,c,0  <  u  <  p)  be  a  sequence 
of  length  p  with  the  perfect  auto-correlation.  Let  (  be  a  primitive 


root  of  unity  of  degree  n,  A  be  a  primitive  root  of  unity  of  degree 
p  be  a  primitive  root  of  unity  of  degree  p*.  Theorem  2;  A 


Let  r  denote  the  maximum  nontrivial  coefficient.  If  all  sequences 
have  the  same  energy,  say  n,  i.e.,  Ro(>)  =  n,  then  the  Sidelnikov- 
Welch’s  lower  bound  [1],[2]  is  as  follows 


Tlieie  are  a  lot  of  papers  devoted  to  designing  of  families  with  near¬ 
optimum  correlation  properties.  Among  other  well  known  are  Gold 
and  Kasami  families.  We  propose  two  new  one's. 

The  sequence  is  said  to  be  a  perfect  one  if  all  "out-of-phase"  auto¬ 
correlation  coefficients  are  equal  to  zero. 

Lemma  1  [3]:  A  sequence  X  =  (zo,Z| , . . .  ,z„_] )  is  a  per¬ 

fect  sequence  if  and  only  if  all  components  of  a  sequence  y  = 
(VOi  Vl  •  ■  ■  ■  I  Vn-l )  have  the  same  magnitude  y//lo(X)  =  \/n,  where 
y  is  the  Discrete  Fourier  IVansform  (DFT)  of  X  ,  i.e., 

n— 1 

. 

jwO 

where  (  is  a  primitive  root  of  unity  of  degree  n. 

The  sequence  is  known  as  the  phase  shift  keyed  (PSK)  sequence  if 
all  components  of  this  sequence  are  on  the  unite  circle. 

The  first  family  Ad]  consists  of  sequences  X(m)  of  length  n  =  p^*, 
where  p  is  an  odd  prime.  Any  integer  s,  0  <  s  <  p^*  -  I,  can  be 
represented  uniquely  in  a  form 

s  =  up*  -h  u,  (5) 

where  0  <  u  <  p*  -  1,  0  <  u  <  p*  -  1. 

Let  (  be  a  primitive  root  of  unity  of  degree  p^*  and  let  A  be 
a  primitive  root  of  unity  of  degree  p*  =  y/n.  Consider  sequences 
X(m)  =  (io(fn),xt(m), . . .  ,Xn—i  (m)),  whose  sth  components  are 

z,  =  z»(m)A'"*”,s  =  0, 1, . . .  ,n  —  1,  (6) 

where  (m,p)  =  1,  u  and  v  are  integers  from  Elqn.  (5),  and  z>(m), 
0  <  V  <  p*  —  1 ,  are  arbitrary  complex  numbers  with  absolute  values 
1.” 

Lnmmn  2  [4];  Sequences  (6)  are  perfect  sequences. 

(Note,  that  if  s«(m)  =  1  for  all  v  then  these  sequences  are  well 
known  FVank's  sequences  [S].) 

Theorem  1;  Let  Ad|  is  a  set  of  sequences  (6),  where  m  = 
1,2, ...  ,p  —  1.  Then  all  cross-correlation  coefficients  have  the  same 
magnitude  p*  =  y/n. 


sequence  of  length  n  whose  sth  component  is  equal  to 

X,  -  x«,cA’“p'"',  0  <  s  <  n,  (9) 

has  the  perfect  auto-correlation. 

Proof.  Straightforward  calculation  of  the  DFT  of  the  sequence 
(9)  shows  that  all  Fourier  coefficients  have  the  same  magnitude. 

Conuder  a  family  -Ad  =  {x(m)),  each  element  of  which  equals 
(zo(m),zi(m),. . .  ,Zp_j(m)),  where  z,(m)  =  p”*  ,  p  is  a  primitive 

root  of  imity  of  degree  p.  It  is  known  that  X(m),  m  =  1,2 . p  -  I, 

are  perfect  sequences. 

Lemma  3  (6);  All  cross-correlation  coefficients  of  sequences  from 
the  family  Ad  have  the  same  magnitude  y/p. 

Consider  a  set  Adj  of  p  —  I  sequences  of  a  form  (9)  where  instead 
of  z»,c  one  uses  Xp(m)  from  the  family  Ad. 

Theorem  3;  All  cross-correlation  coefficients  of  the  family  Adj 
have  the  same  magnitude  >/n. 
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Abstract 

Tht  theory  of  geometrically  uniform  (GV)  codes  is  applied  to  the  case  of 
multidimensional  (MD)  PSK  constellations.  The  symmetry  group  of  an  L  x 
MPSK  is  completely  charactemed.  Conditions  for  rotational  invariance  of 
GV  partitions  of  a  signal  constellation  are  illustrated.  Through  suitable  al¬ 
gorithms,  “good”  GV  partitions  of  L  x  M PSK  (M=d,8,16  and  L=l,i,S,4) 
constellations  are  found.  They  are  used  as  starting  points  tn  the  search  for 
good  GV  trellis  codes. 

1  GU  TCM  SCHEMES 

A  signal  set  S  is  GU  [1]  if  it  has  a  transitive  symmetry  group  r(5),  i.e.  if  for 
any  two  points  s  and  s'  in  S,  there  exists  a  symmetry  of  5  that  sends  s  to 
s'.  A  generating  group  G{S)  of  5  is  a  subgroup  of  r(5)  which  is  minimally 
sufficient  to  generate  S  starting  from  an  arbitrary  initial  point  of  it.  The 
MPSK  Constellation  is  GU,  its  symmetry  group  is  isomorphic  to  the  dihedral 
group  Dsi  and,  in  the  case  of  M  even,  the  only  two  possible  generating 
groups  are  isomorphic  to  Zm  and  Dm/2.  GV  signal  sets  have  the  important 
property  that  the  Voronoi  regions  are  congruent,  so  that  the  error  probability 
is  independent  of  which  signal  was  transmitted.  In  [1]  this  property  was  shown 
to  hold  for  signal  sequences  too,  through  a  suitable  extension  of  the  concept 
of  geometrical  uniformity,  A  normal  subgroup  G'  of  the  generating  group 
G(5)  induces  a  partition  S/S'  of  the  signal  set  5,  in  which  each  subset  of 
the  partition  is  GU  and  has  G'  as  a  common  generating  group.  A  one-to-one 
mapping  is  induced  between  the  quotient  group  G/G'  and  the  subsets  of  the 
partition  S/S'.  If  we  combine  a  linear  code  over  the  label  group  A  ~  G/G', 
i.e.  a  subgroup  of  A'  (with  /  possibly  infinite)  with  the  mapping  G/G'  — • 
S/S'  we  obtain  a  GU  code  over  5.  As  an  example,  a  linear  rate  k/n  binary 
convolutional  code  may  be  used  if  G/G'  ^  (2j)“.  The  basis  for  a  GU  TCM 
code  with  good  properties  in  terms  of  minimum  Euclidean  distance  is  a  GU 
partition  with  a  minimum  squared  Euclidean  distance  within  signal  sets  at  a 
given  partition  level  as  large  as  possible. 

2  GU  PARTITIONS  OF  MD  PSK  CONSTELLATIONS 

We  denote  a  multidimensional  PSK  constellation  obtained  through  the  L-fold 
Cartesian  product  of  a  2D  MPSK  signal  set  with  itself  by 
ixA/PSK.  It  contains  waveforms  formed  by  L  consecutive  MPSK  signals. 
We  prove  that  the  symmetry  group  of  Lx4PSK  constellations  is  isomorphic 
to  Sji  •  and  that  of  LxMPSK,  M  even  larger  than  4,  is  isomorphic 

to  Sl  •  Starting  from  the  symmetry  group  we  develop  an  algorithm 

able  to  construct  all  the  possible  generating  groups  of  the  constellation.  In 
this  way  we  find  generating  groups  which  are  not  simple  Cartesian  prod¬ 
ucts  of  the  generating  groups  of  the  constituent  MPSK  constellation.  We 
call  G  =  Go/Gi/ . .  ./Gn~i/Gn  a  Unary  partition  chain  of  a  group  G  with 
|G|  =  2"  if  Gn, . . . ,  Gi  are  normal  subgroups  of  G  and  |Gp|  =  2  •  lGp+i|  Vp. 
In  order  to  select  “good”  (in  some  sense)  GU  partition  chains  of  the  con¬ 
stellation  5,  we  need  to  associate  to  a  given  partition  chain  some  important 
parameters  like:  the  minimum  Euclidean  intraset  squared  distance  6^  at  the 
p-th  partition  level,  the  isomorphism  of  both  the  normal  subgroup  generat¬ 
ing  the  partition  and  the  quotient  group,  and  the  rotational  invariance  of  the 
partition  chain  at  its  various  levels.  Given  5  =MPSK  we  denote  by  r*  the 
rotation  by  k^  degrees  with  respect  to  the  origin  and  by  rf  the  simmetry 
of  =L  X  MPSK  obtained  through  L  Cartesian  products  of  rj  by  itself.  In¬ 
troducing  the  subgroup  of  r(S^)  called  the  Xotationally  Invariant  subGroup: 
RIG{S‘')  =  {l,ff,(rf  )^,  ■  ■  ■,(rf')""‘}  =<  rf  >2!  Zm,  we  say  that  a  parti¬ 
tion  is  congruent  with  respect  to  €  RIG{S^)  if  rjj:  induces  a  permutation 
among  the  partition  subsets,  and  (rotationally)  invariant  with  respect  to  rf; 
if  this  permutation  reduces  to  the  identity.  Necessary  and  sufficient  condi¬ 
tions  for  the  congruence  and  the  invariance  of  a  partition  are  stated.  When 
RIG(S^)  C  G(S^)  the  partitions  are  automatically  congruent  with  respect  to 
all  rj;  €  RIG(S^)  and  invariant  with  respect  to  r}  iff  rj'  €  G,.  An  algorithm 
is  illustrated  which  scans  all  the  possible  binary  partition  chains  starting  from 
a  given  generating  group  G.  It  constructs  the  tree  of  all  possible  binary  parti¬ 
tion  chains  induced  by  normal  subgroups  of  G,  identifies  each  partition  level 
through  the  parameters  aforementioned  (minimum  Euclidean  distance,  iso¬ 
morphisms  and  rotationally  invariance),  and  chooses  the  beat  partition  chains 
as  paths  through  the  subgroup  tree  according  to  optimality  criteria  related  to 
the  previous  parameters.  Every  partition  chain  is  identified  like  in  Table  I . 

3  SEARCH  FOR  GOOD  GU  TCM  CODES 

The  partitions  tables  obtained  ate  used  to  find  “good”  GU  TCM  schemes 

*  This  work  wss  Mksperted  kp  Itelisn  Setisnsl  Rteesrek  Council  ( CSR)  nnier  “Pngtllo 
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based  on  binary  as  well  as  more  general  group  convolutional  codes  The 
obtained  codes,  as  well  as  their  performance,  are  presented  in  [2]  and  [3].  As 
an  example,  in  Table  2  the  results  of  the  search  for  binary  3x8PSK  GU  codes 
transmitting  2,33  bit/T  for  increasing  complexity  are  presented.  Some  of 
them  improve  over  known  non-GU  codes.  As  for  more  general  group  codes,  in 
Table  3  GU  TCM  codes  for  3x8PSK  based  on  the  group  Z|  and  transmitting 
2  bits/T  are  presented.  They  present  good  characteristics  both  in  terms  of 
Euclidean  distance  and  rotational  invariance.  Error  event  probability  curves 
for  these  codes  are  shown  in  Figure  1 . 
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Abstract 

The  encoding  and  decoding  advantages  of  high-rate  punc¬ 
tured  binary  convolutional  codes  over  memoryless  channel  are 
well  known.  The  puncturing  technique  is  applied  to  Trellis-Coded 
Modulation,  resulting  in  simplified  >^terbi  decoding  at  the  cost  of 
a  small  reduction  in  coding  gain  compared  to  usual  >^terbi  de¬ 
coding.  Using  computer  search,  short-memory  rate  2/3  punctured 
codes  with  the  same  minimum  free  Euclidean  distance  as  Unger- 
boeck’s  optimum  codes  have  been  found;  these  codes  provide  the 
same  tttot  performance  when  decoded  in  the  usual  manner.  In 
addition  to  the  decoding  advantages,  puncturing  provides  greater 
flexibility,  allowing  an  easy  implementation  of  variable  bandwidth 
efficiency  systems. 

Summary 

lyellis-Coded  Modulation  (TCM)  by  using  an  expanded  signal 
set  can  yield  significant  coding  gains  of  3  to  6  dB  over  uncoded 
modulation  without  requiring  more  bandwidth  (1H3].  A  binary 
convolutional  code  of  rate  R  =  m/(m  -f- 1)  is  used  and  the  encoded 
symbols  are  mapped  into  channel  signals  by  following  a  set  of  tulea 
designed  to  maximize  the  Euclidean  distance  [1].  When  decoding 
TCM  signals  with  the  Viierbi  algorithm  (3],  at  each  trellis  level, 
among  the  different  paths  merging  into  a  given  state,  only  the  most 
likely  path,  or  survivor,  is  kept  For  a  rate  R  =  m/(m  -f  1)  code, 
selecting  the  survivor  among  the  2’"  paths  merging  at  each  state 
requires  {2"‘  -  1 )  binary  comparisons  per  stale.  If  the  number 
of  states  is  large  and  if  the  coding  rate  is  high  (i.e.,  m  >  3), 
then  clearly,  Viterbi  decoding  in  this  usual  manner  may  become 
impractical. 

It  is  well  known  that  for  convolutional  codes  puncturing  al¬ 
lows  considerable  simplifications  of  the  encoding  and  decoding 
processes  [4],  [5]:  decoding  a  rate  R  =  m/{m+l)  punctured  code 
requires  only  m  binary  comparisons  instead  of  the  (2"*  -  1)  com- 
parisims  that  ate  requited  by  the  usual  decoder.  As  m  increases,  the 
savings  ate  substantial  while  resulting  in  only  a  slight  performance 
loss  [4],  [S]  as  compared  to  the  best  known  R  =  m/(m  +  1 )  codes. 

In  this  paper  we  present  an  application  of  the  same  technique 
to  Tfellis-Coded  Modulation.  If  the  underlying  convolutional  code 
of  the  TCM  scheme  is  a  rate  R  =  m/(m  +  1)  code  and  if  there 
ate  no  transmitted  uncoded  bits  (i.e.,  no  parallel  transitions),  a 
code  of  the  same  rate  can  be  obtained  by  puncturing  an  original 
low-rate  R  =  1/m  code.  Naturally,  the  original  low-rate  code 
and  the  puncturing  pattern  that  wiU  produce  the  TCM  code  with 
the  maximum  free  Euclidean  distance  have  to  be  determined.  A 
computer  search  has  provided  codes  with  up  to  64  states  which, 
when  punctured,  result  in  rate  2/3  codes  for  8-PSK  modulation  with 
the  same  free  distance  at  Ungerboeck’t  codes.  The  advantage  in 
using  this  technique  it  that  by  changing  the  puncturing  pattern  only, 
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codes  with  coding  rates  R-1/2,  2/3  or  3/4  can  be  easily  obtained 
from  the  same  origiiul  code. 

With  punctured  binary  convolutional  codes,  simplified  decod¬ 
ing  is  obtained  because  at  each  state,  whether  it  is  an  “intermedi¬ 
ate”  state  or  a  “true”  state*,  a  decision  about  the  survivor  can  be 
made.  Although  h  is  not  as  straightforward  as  for  binary  codes,  the 
same  process  can  be  applied  to  TCM  with  a  non-Ungerboeck  set 
partitioning  method  by  using  approximate  metrics  at  intermediate 
states.  This  results  in  the  same  complexity  savings  as  far  binary 
convolutional  codes  but  the  coding  gain  is  sli^itly  lower  than  with 
usual  decoding:  about  O.IS  dB  degradation  for  64  states,  8-PSK 
modulation.  This  degradation  of  the  coding  gain  is  caused  by  the 
metrics  approximation  at  intermediate  states  and  tends  to  decrease 
as  the  free  Euclidean  distance  increases. 

Reducing  the  decoding  complexity  of  a  high-rate  code  for  TCM 
is  particulariy  important  when  the  number  of  states  is  large.  The 
puncturing  technique  provides  an  attractive  alternative  to  the  usual 
approach,  allowing  a  reduction  in  the  number  of  biiuuy  compar¬ 
isons  by  a  factor  of  (  (2”  -  l)/m  )  for  a  rate  R  =  m/(m  +  1) 
code.  When  the  number  of  states  becomes  too  large  to  be  practical 
for  the  Viterbi  algorithm,  the  use  of  the  pcmcturing  technique  and 
suboptimum  algorithms,  such  as  sequential  decoding  or  the  Ads|>- 
tive  Viterbi  Algorithm  [6]  can  be  combined  to  reduce  further  the 
complexity  at  a  small  cost  to  the  error  performance. 
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ABSTRACT 

Design  criteria  for  trellis  codes  with  sequential  decod¬ 
ing  are  examined.  A  comparision  of  trellis  codes  with  Opti¬ 
mum  Distance  Profile  (ODP)  and  Optimum  Free  Distance 
(OFD)  reveals  that  neither  ODP  nor  OFD  trellis  codes  re¬ 
sult  in  the  best  trade-off  between  error  performance  and 
computational  performance  when  sequential  decoding  is 
used.  A  new  approach  is  proposed  to  construct  robustly 
good  trellis  codes  for  use  with  sequential  decoding.  The 
new  codes  obtained  using  this  approach  achieve  nearly  the 
same  free  distances  as  the  OFD  codes  and  nearly  the  same 
distance  profiles  as  the  ODP  codes. 

SUMMARY 

Most  of  the  trellis  codes  constructed  thus  far  have  been 
for  use  with  the  Viterbi  decoding  algorithm[l,2].  However, 
the  computational  effort  of  the  Viterbi  algorithm  grows  ex¬ 
ponentially  with  the  code  constraint  length  u.  This  limits 
its  application  to  codes  with  small  values  of  1/  and  rela¬ 
tively  modest  coding  gains.  On  the  other  hand,  sequential 
decoding  can  perform  almost  as  well  as  the  Viterbi  algo¬ 
rithm  and  its  computational  complexity  is  essentially  in¬ 
dependent  of  V.  Thus,  more  coding  gain  is  possible  when 
larger  constraint  length  codes  are  used  with  sequential  de¬ 
coding.  In  [3,4],  it  has  been  shown  that  sequential  de¬ 
coding  is  a  good  alternative  to  the  Viterbi  algorithm  for 
decoding  trellis  codes.  However,  no  papers  have  addressed 
the  problem  of  constructing  trellis  codes  for  use  with  se¬ 
quential  decoding.  In  this  paper,  trellis  codes  with  Opti¬ 
mum  Distance  Profile  (ODP)  and  Optimum  Free  Distance 
(OFD)  are  examined  and  design  criteria  for  trellis  codes 
with  sequential  decoding  are  discussed.  We  show  that  nei¬ 
ther  the  ODP  nor  the  OFD  trellis  codes  provide  the  best 
trade-off  between  distance  profile  wd  free  distance.  Thus, 
a  new  algorithm  is  proposed  to  construct  robustly  good 
trellis  codes  for  use  with  sequential  decoding. 

First,  trellis  codes  with  optimum  distance  profiles  were 
constructed.  In  the  construction  algorithm,  the  free  dis¬ 
tance  was  used  as  a  secondary  criterion,  i.e.,  the  code  hav¬ 
ing  the  larger  free  distance  is  retrtined  whenever  two  codes 
have  the  same  distance  profile.  Compared  with  the  Unger- 
boeck  codes,  we  found  that  the  ODP  trellis  codes  have 
much  smaller  free  distances  for  some  constraint  lengths. 
For  example,  the  free  distance  of  ODP  trellis  coded  8-PSK 
with  1/  =  7  is  only  4.0  compared  with  6.59  for  the  Unger- 
boeck  code.  This  results  in  a  reduction  of  more  than  2.0 
dB  in  asymptotic  coding  gain.  Thus,  it  appears  that  ODP 


codes  do  not  provide  a  good  trade-off  between  free  distance 
and  distance  profile. 

We  then  conducted  exhaustive  searches  for  OFD  trellis 
codes  in  which  the  distance  profile  was  used  as  a  secondly 
criterion.  Our  results  indicate  that  the  OFD  trellis  codes 
do  not  provide  the  best  trade-off  between  distance  pro¬ 
file  and  free  distance,  either.  For  example,  the  ODP  and 
OFD  treUis  coded  8-PSK  with  ('  =  7  have  distance  prcdiles 
(dg,d?,--,(^)  =  (2.0,2.59,2.59,3.17,3.17,3.76,3.76,4.0) 
and  (2.0,2.0,2.59,2.59,2.59,2.59,3.17,3.17),  respectively. 
Note  that  the  OFD  code  has  a  much  worse  distance  pro¬ 
file  than  the  ODP  code. 

Thus,  we  have  constructed  trellis  codes  which  are  nei¬ 
ther  optimum  free  distance  nor  optimum  distance  profile. 
We  call  the  new  codes  robustly  good  trellis  codes.  Given 
that  a  robustly  good  trellis  code  of  constraint  length  v  has 
been  found,  the  approach  used  to  find  a  constraint  length 
1/  -I- 1  robustly  good  trellis  code  is  to  find  the  code  that 
improves  the  free  distance  or  the  distance  profile  of  the 
constraint  length  v  code,  with  priority  given  to  improv¬ 
ing  the  free  distance.  In  other  words,  we  try  to  find  a 
longer  code  which  has  a  free  distance  or  a  distance  pro¬ 
file  superior  to  or  identical  to  the  shorter  one.  Systematic 
feedback  8-PSK  and  16-QAM  robustly  good  trellis  codes 
with  V  up  to  15  and  asymptotic  coding  gains  up  to  6.66  dB 
are  obtained  using  this  approach.  Compared  to  ODP  and 
OFD  trellis  codes,  the  robustly  good  trellis  codes  provide 
a  much  better  trade-off  between  free  distance  and  distance 
profile.  Indeed,  the  new  codes  achieve  nearly  the  same  free 
distances  as  the  OFD  codes  and  nearly  the  same  distance 
profiles  as  the  ODP  codes. 
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Practical  Trailis  Coded  Modulation  with  Punctured 
Rate-2/3  Convolutional  Codes 

Stephen  K.  How 

General  instrument,  San  Diego,  CA 

Abstract  •  A  trellis  coded  modulation  scheme  is  described 
which  uses  a  punctured  rate-2/3  convolutional  code  to 
simplify  decoder  implementation.  Simulations  of  a  code 
based  on  a  punctured  64-state  rate- 1/2  trellis  show 
comparabie  performance  to  a  32-state  Ungerboeck  code 
(rate-2/3  trellis). 


metrics  used  in  the  second  trellis  step  are  computed 
similarly,  with  the  same  receive  symbol  and  different  subset 
unions.  In  figure  1  two  types  of  punctured  branches  are 
distinguished  to  enhance  the  Euclidean  distance  at  this  trellis 
step.  The  branches  labeled  by  puncture  "x"  (lower-case)  are 
"(xjntinuations"  of  the  3-bit  branches  00-  and  11-.  Also 
puncture  “X"  implies  the  branch  is  the  LSB  of  subset  indexes 
01-  and  10-.  Accordingly,  BMq^  ”  {d^(An,s),  seAuG}, 

BM,x  -  min  {d2(A„,s),  seBuH},  BMqx  “  {d^(An.s), 

seCuE},  and  BM^y  ■  min  {d2(An,s),  seFuD}.  This 
distinction  between  punctured  branches  provides  a  larger 
Euclidean  distance  mapped  to  subset  LSB,  as  shown  by  the 
shaded  and  unshaded  sets  in  figure  2. 


Two-dimensional  trellis  coded  modulation  (TCM)  achieves 
up  to  6dB  coding  gain  by  partitioning  the  symbol 
constellation  8  ways,  which  increases  uncoded  symbol 
spacing  by  2  V2.  An  Ungerboeck  code  maps  two  coded  bits 
to  the  8  subsets  using  a  rate-2/3  convolutional  code  [1].  In 
high-speed  decoders,  the  Viterbi  algorithm  (VA)  is 
implemented  in  a  parallel  manner,  and  the  complexity  of  the 
2/3  trellis  can  limit  the  number  of  states  to  16.  In  satellite 
applications  64-state  rate-1 /2  Viterbi  decoders  are  commonly 
built  as  ASICs. 
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A  rate- 1/2  trellis  with  every  other  branch  punctured 
specifies  a  rate-2/3  code.  A  trejlis  code  with  the  same 
spectral  efficiency  as  an  Ungerboeck  code  needs  to  transmit 
2  coded  bits  /  symbol.  Two  bits  are  encoded  in  two  steps 
through  the  rate- 1/2  trellis,  generating  an  unpunctured  and 
punctured  branch. 


Figure  1 .  Punctured  4-state  trellis 


In  the  punctured  TCM  scheme,  these  2  branches  are 
mapped  to  a  subset  of  an  8-way  partitioned  Q/^ 
constellation.  A  mapping  and  decoding  method  are  needed 
to  assign  large  Euclidean  distances  to  the  error  events  of  the 
punctured  trellis.  In  the  example  of  the  4-state  trellis  in 
figure  1 ,  the  unpunctured  branch  output  defines  the  2  MSBs 
arKi  the  punctured  branch  defines  the  LSB  of  the  symbol 
index  in  figure  2.  In  this  manner,  2  coded  bits  define  a 
subset.  Uncoded  bits  define  the  subset  member. 

In  decoding,  the  branch  metrics  for  the  unpunctured  trellis 
step  are  first  computed  and  applied  to  the  VA.  Using  the 
same  receive  symbol,  the  punctured  metrics  are  then 
computed  to  de<^e  the  second  coded  bit.  The  symbol 
mapping  and  decoding  are  an  attempt  to  orthogonalize  these 
two  steps.  The  first  set  of  branch  metrics  are  computed 
approximately  as  the  minimum  distance^  between  the 
receive  symbol  and  the  subset  points  grouped  by  index 
MSBs.  I.e.,  BMqo  -  min  {d2(An,s),  se  AuB),  BMqi  -  min 
{d^(An,s),  se  CvjD),  etc.  where  is  the  receive  symbol,  and 
d2( ,  )  is  squared  Euclidean  distance.  Actual  metrics  are 
based  on  the  log  of  conditiormi  probabilities.  The  punctured 


Hgure  2.  1 60AM  partition 

The  described  mapping  approximates  an  assignment  of 
2yl2  Aq  to  3-bit  branch  1 1 1  (from  000),  V2  Aq  to  1 10  and  001 , 
and  Ao  to  010, 101 ,  100,  and  01 1 .  From  this  assumption,  an 
optimal  punctured  64-state  rate- 1/2  code  was  found  to  be 
(101,  109,  101)  (octal),  yielding  -  7,  or  5.45  dB 

asymptotic  coding  gain.  Figure  3  shows  simulated 
performance  of  the  punctured  TCM  app  oach  vs. 
Ungerboeck  codes  for  1 6QAM. 

[1]  G.  Ungerboeck,  Trellis-coded  Modulation  with 
Redundant  Signal  Sets  Part  11",  IEEE  Communications 
Magazine,  voi.  25.  no.  2,  Feb.  1987. 


Gs/NO  (dB) 

Figure  3.  Comparison  of  punctured  with  Ungerboeck  codes 
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Design  of  Optimal  Filters  for  Use  as  Bandwidth-Efficient 

Coded  Modulation 

Amir  Said*  and  John  B.  Anderson 

Department  of  Electrical,  Computer,  and  Systems  Engineering 
Rensselaer  Polytechnic  Institute,  Troy,  NY  12180 


Recent  work  hais  shown  that  in  channels  with  intersymbol  inter¬ 
ference  (ISI),  whether  finite  or  infinite  response,  it  is  possible  to 
achieve  almost  maximum-likelihood  detection  performance  with 
reduced-search  algorithms.  The  number  of  operations  required 
by  those  algorithms  may  be  orders  of  magnitude  smaller  th«in 
that  required  by  the  Viterbi  algorithm.  Hence,  controlled  ISI, 
such  as  that  introduced  by  a  band-limitation  filter,  can  be  used 
to  improve  performance  without  an  exponential  increase  in  the 
detection  complexity.  This  is  actually  a  form  of  coding  where, 
for  a  fixed  noise  immunity  performance,  the  gains  are  measured 
in  bandwidth  reduction. 

In  a  typical  application  of  bandwidth-efficient  coded  modu¬ 
lation,  ISI  may  be  introduced  by  the  non-ideal  response  of  the 
channel  and  by  intentional  filtering  at  the  modulator  output  to 
constraint  the  bandwidth.  This  is  modeled  in  Fig.  1.  The  filter 
/(n)  may  comprise  an  explicit  coded  modulation,  for  which  we 
seek  the  optimal  design.  We  propose  a  method  that  simulta¬ 
neously  constrains  the  bandwidth  and  maximizes  the  minimum 
Euclidean  distance  between  signals.  We  show  that  it  can  be 
formulated  as  a  linear  program;  and  it  allows  uncoded  or  trel¬ 
lis  coded  data,  filters  with  infinite  impulse  response,  and  many 
types  of  spectrum  shaping  constraints  (e.g.,  zeros  at  /  =  0  or 
Chebyshev  filters).  The  proposed  filter  can  aJso  be  considered  a 
convolutional  coder  that  matches  the  code,  output  and  channel 
filters  for  better  performance. 

In  Fig.  1,  the  discrete-time  FIR  filter  is  used  for  spectral 
shaping  in  the  Nyquist  frequency  interval  and  to  maximize  the 
minimum  Euclidean  distance.  The  modulator  output  filter  is 
used  to  steeply  attenuate  the  frequencies  outside  the  desired 
bandwidth;  and  hc(t)  is  the  response  of  the  linear  ch^nel.  For 
now,  we  use  a  simple  definition  of  the  bandwidth  W,  where  a 
fixed  and  small  fraction  of  the  modulator  output  power  is  outside 
the  frequency  interval  [—W,  W]. 

Mathematical  Formulation 

Here  we  assume  an  ideal  channel,  i.e.,  hc(t)  =  S{t),  but  the 
generalization  is  straightforward.  The  impulse  response 

h(t)  =  ‘^f{n)h,{t-nT), 

n=0 

is  used  to  define  the  modulator  output 

“)  =  X)  “(” 

n 

where  u(n)  is  the  complex  data  sequence. 

We  define  the  correlations 

g/{n)  =  E /(*  +  ”)/*(*')’ 

k 

g,(n)  =  r  h4t  +  nT)k:(t)dt. 
y->oo 

•This  rocarcli  was  partially  supported  by  CNPq  -  Consclho  NaeioDai 
d«  Dcecnvolviinciito  CiratlAco  e  'tKnoldgieo,  Brasil. 


MODULATOR  CHANNEL 


Figure  1;  Modnlator/chaanel  model. 


The  objective  is  to  find  the  filter  taps,  /(n),  that  maximize 
the  minimum  Euclidean  distance  subject  to  the  bandwidth  con¬ 
straint,  but  the  problem  is  solved  by  finding  the  optimal  corre¬ 
lation  gf(n);  the  optimal  f(n)  can  be  obtained  from  ff/(n)  via 
spectral  decomposition. 

It  can  be  shown  that  the  squared  Euclidean  distance  between 
a  transmitted  and  an  erroneous  sequence  is 

n 

where  e(n)  is  the  difference  of  the  two  data  sequences  and 
/ie(n)  =  "  *) E  +  *)«*(”»)• 

*  m 

The  average  energy  per  symbol  is  set  by  the  linear  constraint 

^9Arn)gl(m)  =  1, 

m 

and  the  fraction  of  the  power  inside  the  bandwidth  is 
J-W  n 

r(n)  = 

Finally,  gj{n)  is  a  correlation  sequence  only  if 

forall  /€[0,1). 

n 

In  a  practical  solution  method,  we  use  sets  with  a  small  num¬ 
ber  of  carefully  chosen  error  sequences  (f )  and  frequency  points 
The  resulting  linear  program  is: 

^>L..op.=  * 

^  foralleef, 
EnP/(")s:(«)  =  l. 

En  »/(”)’■"('>)  = 

>0,  forall/e7>. 

We  present  results  on  a  variety  of  code-filters  designed  by 
this  procedure.  The  decoding  complexity  was  measured  by  M- 
algorithm  tests. 
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On  a  Class  of  Constant  Envelope  Continuous 
Phase  Modulation  Schemes,  Obtained  by 
Imposing  Continuous  Phase  TVansitions  on 
Trellis  Coded  Asymmetric  PSK 

Johan  U<14n,  Goran  Ltndell 
Telecommunication  Theory 
Lunds  University,  Box  118,  S-221  00  Lund,  Sweden 

Abstract 

The  problem  of  how  to  construct  information  carrying  continuous 
phase  functions,  which  yield  power  and  bandwidth  efficient  schemes, 
is  addressed.  The  additive  white  Gaussian  noise  channel  and  coherent 
maximum  likelihood  sequence  detection  are  assumed.  Our  approach 
is  to  use  trellis  coded  asymmetric  PSK  schemes  of  low  complexity  and 
with  good  distance  properties.  Consecutive  phase  values  of  the  phase 
sequences  generated  by  these  schemes  are  interconnected  by  a  con¬ 
tinuous  function.  Thus,  a  continuous  phase  function  is  obtained.  In 
conventional  full  response  CPM,  the  Euclidean  distance  is  bounded 
by  an  error  event  two  symbols  long.  The  continuous  phase  schemes 
obtained  here,  have  a  shift-register  state  trellis  structure.  This  guar¬ 
antees  long  error  events,  thus  the  schemes  have  a  potential  for  large 
Euclidean  distances  related  to  the  number  of  states  in  the  trellis. 
There  are  schemes  within  this  class  with  power  and  bandwidth  effi¬ 
ciencies  (dj,i„  versus  99%  bandwidth)  which  are  very  good,  consider¬ 
ing  the  low  complexity  of  the  schemes.  They  are,  in  fact,  competetive 
with  and  sometimes  better  than,  some  of  the  best  coded  continuous 
phase  modulated  schemes,  of  comparable  complexity,  previously  pub¬ 
lished. 

System  Description 

Our  approach  is  to  start  with  good  coded  asymmetric  PSK  schemes, 

[1],  having  a  shift  register  state  trellis.  By  asymmetric  is  meant  that 
the  phase  values  used  are  nonuniformly  spaced,  i.e.  for  asymmetric 
4- PSK  the  set  {O,4i,x,x-f-0}  is  used.  Consecutive  phase  values  of  the 
phase  sequences  generated  by  these  schemes  are  interconnected  by  a 
continuous  function.  Thus,  a  continuous  phase  function  is  obtained, 
see  fig.  1.  The  shape  of  the  phase  transitions  is  the  same  in  every 
symbol  interval,  but  the  amount  of  change  in  the  phase  during  a 
symbol  interval  depends  on  the  current  data  and  the  state  of  the 
encoder.  The  continuous  phase  function  can  be  written  as  = 

4)rEM-ooM^^n,<rn)«(t  -  nT.).  €  {0, 1, .. . ,  Jl/ -  1}  is  the  data 

that  arrives  at  the  modulator  at  t  =  nTj;  <t„  is  the  state  of  the 
encoder  at  t  =  nT,-,  q{t)  is  the  phase  response  and  equals  0  when 
t  <  0  and  1/2  when  (  >  T,.  The  amount  of  change  in  the  phase 
during  a  symbol  interval  is  2irh(Un,<rn).  h{Un,<fn)  the  modulation 
index  associated  with  the  transition  in  the  trellis  caused  by  the  data 
Un  when  the  encoder  is  in  the  state  <7„.  The  choice  h(Un,<fn)  = 
Un^  renders  a  scheme  in  the  traditional  CPM  class,  but  in  general 
h{Un,<rn)  is  a  nonlinear  function.  The  transmitted  signal  is  s((,  ll_)  = 
^^cos(2ir  fot  /o  1*  the  carrier  frequency  assumed  to  be 

much  larger  than  l/T,. 

When  continuous  phase  is  imposed  on  a  coded  PSK  scheme,  con¬ 
secutive  values  of  h(U„,  <r„)  are  chosen  so  that  4(t,  U,)  coincides  mod¬ 
ulo  2ir  with  the  values  of  the  original  phase  sequence  at  the  end  of 
each  symbol  interval.  There  are  several,  in  fact  infinitely  many,  pos¬ 
sible  choices  of  the  modulation  index  for  a  specific  phase  transition. 
An  extra  M-ary  delay  element  is  needed  in  the  shift-register,  and  the 
number  of  states  in  the  new  trellis,  S,  is  M  times  larger  than  in  the 
original  one.  The  extra  delay  element  is  necessary  because  continu¬ 
ous  phase  demands  knowledge  of  the  phase  both  at  the  beginning  of 
the  current  symbol  interval  and  at  the  beginning  of  the  next  symbol 
interval,  see  fig.  1  and  2. 

Within  the  obtained  class  of  continuous  phase  modulated  schemes, 
schemes  of  low  complexity  having  power  and  bandwidth  efficiency 
comparable  to,  and  sometimes  better  than,  the  schemes  given  in 
[2,3,4]  can  be  found.  Examples  of  results  for  the  symmetric  case 
(0  =  ir/2)  are  given  in  figure  3. 


10,31 
11.01 
(0,11 
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Figure  2.  a)  Trellis  and  a  encoder  of  a  coded  asymmetric  PSK  scheme, 
see  ref  [1].  b)  Trellis  and  encoder  when  continuous  phase  is  imposed 
on  the  scheme  in  a).  The  label  on  a  state  transition  is  the  modulation 
index  used  for  that  transition.  The  start  phase  is  given  as  part  of  the 
state. 


l0log,o(-<L„/2)  Mfil 


Figure  3.  Asymptotic  power  gain  over  MSK  plotted  against  the  99% 
in  band  power  bandwidth.  Schemes  using  the  same  set  of  modulation 
indices,  but  different  frequency  pulses,  are  connected  with  straight 
Unes  (’-f’  IREC  ,  ’o’  IHCS  and  ’x’  IRC).  The  IREC  schemes  on 
the  dashed  lines  have  the  same  efficiency  as  schemes  of  the  same 
compleicity  given  in  [2,3]. 
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CODED  MODULATION  WITH  INTERBLOCK  MEMORY 


Shang-Chih  Ma  Mao-Chao  Lin 
Department  of  Electrical  Engineering 
National  Tvwan  Univenity 
Taipei,  Taiwan,  Republic  of  China 


Abstract 

In  this  paper,  we  introduce  a  new  Trellis  Coded  Modu¬ 
lation  scheme  with  a  two-fold  dependency  between  signal 
pcunts.  In  our  coding  scheme,  in  addition  to  the  depen¬ 
dency  among  coded  multi-dimensional  signal  points  de¬ 
scribed  by  the  trellis,  each  coded  multi-dimensional  signal 
point  has  another  kind  of  dependency  on  one  previously 
coded  multi-dimensional  signal  point. 

1  Preliminaries 

Let  Ca  and  C^a  be  (n,  ik.,  da)  and  (n,  ha  -I-  r,  doa)  binary  codes 
with  generator  matrices  Ga  and  [G^G^]'^  respectivdy.  Also,  let 
Ci  and  Cie  be  (n,  ds)  and  (n,  hs  -fr,  d»)  binary  codes  with  gen¬ 
erator  matrices  Gs  and  respectively.  We  can  construct 

a  (2n,  ka  +  ki  +  r)  binary  code  C  with  generator  matrix  of  the 

Ghr  G„ 

following  form  :  G  =  0  Ga  ,  where  each  0  represents  an 

[  Gs  0  J 

all  zero  matrix. 

Consider  a  BCM  scheme  with  interblock  memory[3].  Elach 
two-dimensional  signed  symbol  in  the  two-dimensional  signal  s- 
pace  Wo  is  labeled  by  three  bits  (a,i,e)  as  shown  in  Figure  1. 
Let  w  =  andr  =  («I,6;,cI,- 

represent  two  consecutively  encoded  2n-dimensional  signal  points 
in  Vq.  The  combination  of  two  adjacent  2n-diiDensional  signal 
points,  represented  by  may  be  called  a  superblock.  In  our 

scheme,  (ci ,  ■  ■  ■ ,  c„)  and  (<^,  ■■■  are  codewords  of  an  (n,  A^,  dc) 

binary  linear  code  G,.  Moreover,  (6i,  •  •  • ,  4n, «I,  •  •  • ,  is  a  code- 
wordinG.  Letu'  =  (oi,6i,ci,  -,a;„4;.,c;,)andr'  =  (ol',4i',<, 
■  ■  ■ ,  a*',  i*',  be  combined  to  represent  another  superblock.  Sup¬ 

pose  that  (ni,  •  ■  ■ ,  a„)  =  (o^,  •  •  ■ ,  a'„).  U  the  condition  of  fmn{0.8- 
daa  +  l-6  db6,  0.8  da)  1.6 ‘<4}  >  3.2  de  is  satisfied,  theMSED  be¬ 
tween  coded  signal  superblocks  represented  by  (7,r*)  and 
is  3.2  ■  dc. 

Example  1  :  Let  n  =  4.  Let  C,,Cca,Ch,Cn  and  C,  be  (4,1,4), 
(4,3,2),  (4,2,2),  (4,4,1)  smd  (4,4,1)  binary  linear  block  c<^es 
respectivdy.  As  a  result,  D\  =  D\  =  D\  —  =  3.2.  The 

average  coding  rate  is  9/4  information  bits  per  two-dimensional 
signal  symbol.  Compared  to  tmeoded  QPSK,  the  asymptotic  cod¬ 
ing  gain  is  2.55  dB. 

2  The  Proposed  Coded  Modulation 
Scheme 

We  now  iUustrate  the  procedure  of  introducing  interblock  mem¬ 
ory  to  the  TCM  corutructed  from  BCM  by  modifying  example  1. 

Let  Vi  represent  a  Ifi-dimetuional  signal  space,  in  which  each 
16-dimetuional  signal  point  is  labeled  by  (oi,  4i,  ci,-  ■  -,  at,  bt,  04,0^, 

■  •,  a4,42,cj),  where  01,03,03  and  04  are  fixed.  Here  the  t- 
wo  blocks  (oi,4i,Ci,- •  •,  at,bt,ct)  and  (oJ,6I,c;,- ••,  oJ,4;,c;)  are 
separated  by  18  blocks.  Hence,  these  two  blocks  are  not  ad¬ 
jacent.  Since  the  2-dimensionat  signal  space  Wq  is  the  8-AMPM 
signal  space,  we  see  that  Vi  is  a  (8,20,0.8)  block  modulation  code 


Let  be  a  subset  of  Vi,  for  which  the  partial  labeling  (4i,-  ■  -, 
^4>ai>  "1^4)  of  each  16-dimeiuional  signal  point  is  a  codeword 
of  G.  Thus,  is  a  (8,17,3.2)  block  modulation  code.  We  may 
partition  Vi  into  the  disjoint  union  of  8  cosets  of  V3. 

Let  V]  be  a  subset  of  Vi .  It  can  be  constructed  such  that  14  is  a 
(8,19,1.6)  block  modulation  code.  The  partition  chain  I4/I4/I4 
has  increasing  intraset  MSED  of  0.8,  1.6  and  3.2  respectively. 
With  the  partition  chain  of  Vi/Vo/Vi,  we  can  design  an  efficient 
TCM  constructed  from  BCM  with  additional  interblock  memory. 
During  the  encoding,  each  time  we  encode  an  11-bit  message  IK  = 
(mi, m3,  •  ■  ■  ,mii)  into  an  S-dimensional  signal  point  r^resented 
by  (®i>  *>1)  Cl,-  •  ,04,  bt,  C4),  where  (oi,-  •  -,04)  was  determined  in  an 
earlier  encoding  time.  In  the  meantime,  the  (a|,-  ■  ■  ,aU  part  of 
another  8-dimensi;nial  signal  point  is  also  determined  for  later 
usage.  The  message  bits  m«  and  mr  are  used  as  the  input  of  a 
(3,2,3)  convolutional  code  encoder  and  generate  the  output  bits 
uo,ui,W3,  which  are  then  used  to  select  one  of  the  eight  cosets 
of  G.  The  message  bits  mi,m3, ■■■,m3  are  used  to  choose  a 
codeword  (4i,  ■  ■  ■ ,  ,  aj)  fr^>ni  the  selected  coset  of  G.  The 

message  bits  ma,  ■  ■  • ,  mu  are  used  to  determine  the  codeword 
(ci,  •  •  • ,  C4)  of  Ge.  In  the  trellis,  the  branches  emanating  from 
the  same  state  or  mer^ng  into  the  same  state  all  belong  to  the 
same  coaet  of  V3.  Thus,  the  MSEID  between  any  two  code  paths 
is  3.2.  The  coding  rate  of  this  coded  modulation  scheme  is  11/4 
bits  per  two-dimensional  signal  symbol.  Compsued  to  uncoded 
QPSK,  the  asymptotic  coding  gain  is  3.42  dB. 
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Abstract 

Sequential  decision  algorilhms  at  investigated  for  individual  data 
sequences,  with  various  application  areas.  Simple  univefsal  schemes  are  known 
to  approach  optiinality  as  fte  as  /i~'  logn,  whm  n  is  the  sample  siae.  For  the 
finite-alphabet  case,  schemes  that  ate  impieinentabie  by  finite-state  machines 
(FSM’s),  Me  studied.  It  is  shown  that  Maifcovian  machines  rith  sufficiently  long 
memory  are  neariy  at  good  as  any  randomized  FSM.  For  the  continuous-valued 
case,  a  useful  cIm  of  parametric  schemes  is  discussed  widi  application  to  the 
recursive  least  squares  (RLS)  algorithm. 

Summarjr 

Varic'js  problems  in  information  theory  and  signal  processing  are  associ¬ 
ated  with  aeleaing  a  good  straugy  b,  for  minimiziag  an  additive  loss  fuiction 
’EJLiHbi.x,).  While  the  data  Z|,Z2>—  normally  flows  sequentially,  the  best 
strategy  (within  tome  class)  for  this  sequence  depends  on  the  entire  sequence,  and 
hence  CMuiot  be  anticipated.  Neveithdess,  it  has  been  observed  in  some  ./plica¬ 
tions,  ihM  applying  the  best  strategy  for  the  data  observed  sofarii  asymptotically 
as  good  as  the  best  fixed  strategy  that  could  have  been  chosen  in  retrospect 
Moreover,  the  performance  of  this  dynamic  policy  is  within  O  (ff  log  n  )  close 
to  optimality,  uniformly  for  every  possible  n  -sequence. 

One  example  is  sequential  universal  data  compression.  Let  Z|,z 2 . 

be  a  binary  siring.  Let  n,(0)  and  /i,(l)  denote  counts  of  *0'  and  *1'.  respectively, 
along  the  I  first  symbols.  Define  Pi(z)  =  («,(*)+  l/2V(f+l).  x  =0,1,  as  the 
respectiveempiricalprobabilitiesof ‘O' and ‘1’.  Then,  it  is  well  known  ihM 

-  i  i  -logp,0t,)+|  -^^-t-OC-i).  (1) 

The  left  hand  side  is  the  normalized  length  of  a  codeword  associated  with  a 
sequential  Shannon  encoder  based  on  current  empirical  letter  probobilicies  from 
data  observed  so  far.  The  first  term  on  the  right  is  the  empirical  entropy  associMed 
with  X*,  which  corresponds  to  the  minimum  normalized  codeword  length  associ¬ 
ated  with  a  fixed  codebook  that  one  could  have  achieved  if  he  knew  a-priori 
{pa(z))2^.i.  The  0(ii~' log  n)  term  is  the  loss  in  performance  due  to  sequen¬ 
tiality.  Eq.  (1)  can  be  formalized  as  a  sequential  minimization  problem,  where 
i(bji)it-iogb  forx=Oand-log(l-h)  forx=l,  and  where  h  e  (0,1). 

Another  appiicaiion  of  (1)  is  sequential  gambling  where  M  each  round  t  the 
player  doubles  the  fraction  of  the  current  capital  S,  wagered  on  the  next  outcome, 
i«.,  5,4.]  =  2bSi  if  2,4.1  * 0  *nd  5,4i  =  2(i-b)S,  if  2,4)  =  1.  It  is  easy  to  see 
that  the  exponential  growth  rate  a'*  log5,  of  the  capital  is  the  average  of 
I-(  (h ,  2, ),  where  ( (-,-)  is  as  before. 

Portfolio  selection  for  optimal  htvcstmeat  is  an  extension  of  the  tibove 
described  gambling  problem,  where  5,  is  distributed  over  m  investment  oppor¬ 
tunities  accotdiag  to  some  portfolio  b  e  R" ,  a  vector  of  weighu  summing  to 
ut^.  The  stock  inMket  M  /  is  given  by  a  vector.;^  e  R"  with  components, 
2/,  representing  the  return  per  monetary  unit  aUocamd  to  stock  I  at  day  f .  The 
yield  oer  unit  invested  is  the  weighted  average  of  leturo  ratios,  ie.,  b*x, ,  where 
F  iM...  jtes  tran^osition.  Thus,  the  exponential  growth  rate  ii~'  logS.  of  the  cap¬ 
ital  is  the  time-average  of  /(b,x,)  =  log(h*2,).  bi  [1]  a  sequential  portfolio 
selection  scheme  is  proposed  for  bounded  mraket  vector  sequences,  which  is 
again  $t  good  as  the  optimal  fixed  investment  policy  up  to  a  term  of 
0(/i'''logn).  The  proof  in  [I],  however,  relies  heavily  on  special  properties  of 
the  function  log(ft*2). 

In  [2]  a  result  in  the  same  spirit  is  establiahed  for  prediction  of  binary 
sequences,  wherr  predicson  are  souj^t  that  uniformly  nunimize  the  fraction  of 
enors.  Thestraregyh,  is  an  estimate  2, 4,  of2^4|  and/(2,4|,2,4|)istheindica■ 
tar  function  of  an  error.  Agam,  the  tedmiques  in  [2]  are  specific  to  thi?  particulM 
loss  function. 

These  examples  are  all  special  cases  of  the  sequential  compound  decision 
problem  (SCDF),  which  was  first  presented  by  Robbins  [3]  and  has  been 
thoroughly  invtsiigaaed  since  then  by  msny  researchers.  The  setup  of  the  SCOP 
is  more  general  because  it  assumes  ihM  the  observer  sees  noisy  versions  of  {2, ). 
Upper  bounds  have  been  developed  in  the  iiieraiuR  (see,  e.g.,  references  in  [4^ 
on  the  decay  rase  of  the  difference  between  the  average  loss  associated  with  the 
best  seqneniial  strategy  and  that  of  the  best  fixed  strategy.  The  scope  of  these 
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results  has  been  later  extended  (see,  eg.,  (SJ)  and  sequential  decisioo  procedures 
have  been  developed  whose  performance  is  neariy  as  good  as  ihM  of  tte  best  kth 
order  Markovian  (rather  than  fixed)  strategy,  iA.,  the  best  strategy  thm  depends 
on  the  k  preceding  outcomes.  While  the  Markovian  strategy  is  plausible  when  tie 
sequence  has  a  “Markov  structure”  (S],  it  hm  not  yet  been  justified  rigorously  for 
a  general  sequence. 

Our  first  result  serves  as  a  step  towards  such  a  justificatioa.  For  simplicity, 
we  assume  (2, )  to  be  dueedy  accessible  (witbom  noise)  as  in  the  above  exam¬ 
ples,  and  we  conaider  smegiea  ihM  ate  impiemeniabie  by  a  deterministic  df -state 
machine.  We  extend  Theorem  2  of  [6]  and  show  that  for  a  sufficieaily  large  k 
(independently  of  the  data)  smI  any  M  -state  machine,  the  best  k  th  order  Markov 
machine  performs  within  t  sa  got^  as  the  Af  estate  machine.  This  means  tluu  in 
the  limit  as  k  -*•»,  a  Markovian  machine  is  as  good  as  the  best  deterministic  FSM. 
As  a  result,  one  esn  gradually  inctease  the  Markov  order  at  a  togariihmic  rate 
independently  of  the  particular  sequence,  and  guarantee  convergence  to  the  limit 
as  14  o(  the  minunum  loss  attainable  by  Af -state  machines  for  an  infinite 
sequence.  This  resuh  further  extends  and  it  turns  out  that  deterministic  Markt>- 
viim  machinea  ctanpete  successfully  with  every  rrridomized  FSM  in  the  sense  of 
minimizing  the  expected  value  of  jc,)  where  the  expectation  is  with 

respect  to  the  randomization.  For  mote  gmenl  performance  criteria,  however,  it 
is  demonstrated  that  this  principle  does  not  necessarily  hold. 

This  property  of  Markovian  strategies  is  then  utilized  in  onler  to  relate  the 
least  asymptotic  lorn  achievable  by  FSM's  over  individual  sequences  to  thM  of  the 
probabilistic  case  where  any  limitations  on  the  allowed  nonstuicipating  strategies 
are  relaxed.  Specifically,  followitkg  Algoet  (6),  where  the  Shatmoo-McMill  n- 
Brieman  theorem  has  been  extended  to  a. general  sequential  decirion  problem 
under  a  stationary  esgodic  regime,  we  show  that  these  two  quantities  agree  with 
probability  one  over  an  infiniie  sequence. 

Markovian  schemes  ate  useful  also  in  ctmtinuous  alphabet  applications. 
One  familiar  example  is  metlictioa  under  the  mean  squared  etitv  (MSE)  criterion, 
ie.,  l(b,,x,)m(x,-b,)\  where  the  the  predKtor  b,  is  given  by  a  function 

f{x,^ . 2, .|)  of  die  k  most  tecem  outoomes,  e.g.,  a  linem  pre&tor,  where 

f(Xi-e . 4i.|)  =  Ci  x,^ .  The  sequential  versioa  of  this  lineM  predic- 

icT  leads  to  the  recursive  least  squares  (RLS)  algorithm,  which  is  here  shown  to  be 
.  tivetsal  in  the  above  sense.  Another  example  is  vector  quantization  where 
zeR”*  and  /(h,2)  =  rf(2,C»(2)),  </(-,-)  being  a  distortion  measure  and 
Ca(’)  a  quantization  function  with  quantization  cells  and  centroids  parametrized 
by  h .  Again,  by  allowing  A  to  depend  on  the  k  preceding  sangtles  (or  their  quan¬ 
tized  versions),  we  can  implement  a  ftanily  of  vecur  quantizers  with  memory, 
e.g.,  feedback  quatuizert.  predictive  quantizers,  finite-state  quantizers,  etc. 
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Abstract: 

InthupiperwepnaeMatnaicaiedseqnmialiestlbrtbedeiKtiaaofweUtaig- 
naU  in  ad^ve  aoiae.  tafonnanoe  evainatioos  Cor  both  tnaicaied  and  nunaicaied 
tests  see  coaskteied,  and  nnmeticd  lesaits  see  preseaied.  We  also  develop  a 
sequential  teat  for  weak  signal  detection  with  M-level  quandzatiaa  of  obaened 
data.  Numerical  results  are  peesented  for  the  cases  when  the  signal  is  detenninis- 
tic,  and  when  the  signal  is  stochastic  with  known  peohability  density.  A  perfor¬ 
mance  comparison  of  the  sequential  tests  using  quantized  and  onquaniixed  data  is 
also  provided. 


Summarv: 

Detection  of  a  signal  in  additive  noise  is  formulated  as  the  hypodie- 
sis  testing  problem  stated  as  follows: 

H,  :  Xi  -  S,  +  Ni  .  i-lA . 0) 

Ho  :  Xj  -  N,  .  i-1,2, . N 

where  &  ■  (  si,  S2 . Sp)^  is  the  signal  sample  sequence,  and  N 

3  (ni4i2 . bnV  represents  the  additive  noise.  ( X|,X2, . ,Xf^ 

represents  the  observed  data  vector.  Hypothesis  testing  is  imple¬ 
mented  either  as  a  fixed-sample-size  (I^S)  test  or  as  a  sequential 
test  The  FSS  test,  involves  the  comparison  of  a  likelihood  ratio 
against  a  single  threshold,  while  deciding  in  fiivor  of  either  Hq  or  Hj, 
and  uses  N  observed  data  samples  in  the  process.  Hence,  a  decision 
is  reached  only  after  N  observations  have  been  received.  A  FSS  test 
can  be  implemented  using  several  methods  iiKluding  the  Neyman- 
Pearson,  and  the  Bayesian  techniques.  The  detector  threshold  is 
designed  according  to  the  required  detector  performance,  namely 
the  probabilities  of  detection  and  false  alarm.  In  comparison  with  the 
FSS  test,  the  sequential  test  requires  on  the  average,  a  smaller  num¬ 
ber  of  samples  to  reach  a  decision.A  sequential  test  can  be  designed 
to  minimize  the  average  detection  time.  The  Sequential  Probability 
Ratio  Test  (SPRT)  derived  by  Wald  [1]  is  known  to  be  the  optimum 
sequential  test 

The  Sequential  Probability  Ratio  Test  can  be  stated  as  follows: 


/l^m(  a/ffy ) 
ut4Xj  = - 


i  A  =»  accept  H| 

SB  accept  Hg 

otherwise  =»  continoe  test 


where  LRpQO  the  likelihood  ratio  atthen-th  stageof  the  sequen¬ 
tial  test  with  n  being  a  random  variable.  f2(iHi(&*Hi)  and  fvKg) 
( glHo)  are  die  multivariate  density  functions,  of  2^  conditioned  on 
H|  and  Hq  reflectively.  A  and  B  ait  the  thresholds  of  the  sequential 
test  For  prespecified  probability  of  false  alarm  ot  and  probability  of 
miss  P,  approximate  expressions  for  A  and  B  are  obtained  in  [  1]  as; 
A  -  (l-pVbt  and  B-p/(l-a). 


Consider  the  detection  problem  of  a  random  signal  sequence  with 
known  multivariate  density  fgCS)-  The  likelihood  ratio  for  this  case 
can  be  expressed  as  follows  [2]: 

Jn(X-5) 

where  f|q(.)  is  the  density  function  of  the  additive  noise. 

Under  weak  signal  conditions  f^atifxlH])  is  approximately  equal  to 
likelihood  ram  in  (3),  is  thendbre  approximately 
equid  to  one.  This  introduces  difficulties  in  the  implementation  of  a 
likelihood  ratio  test  Alternatively,  -  S)  in  (3)  can  be  expanded 
in  a  Taylor  series  around  S  -  0.  Assuming  that  the  signal  is  always 


small,  and  keeping  only  the  first  and  second  order  tenns  in  S,  we  can 
obtain  a  more  manageable  form  of  the  likelihood  ratio  to  impleiiient 
the  hypothesis  testing  problem.  i.e. : 

y.  =  -^log(/iv(2f))  (*i)„  =  W 

(V;)  „  =  Zy  =  (/n  (« ) 

Sequential  detection  of  weak  signals  using  the  Ikylor  series  approx¬ 
imation  of  the  likelihood  ratio  given  above  has  been  considered  to  a 
certain  extent  in  [3].  Numerical  results  are  not  presented,  and  a  per¬ 
formance  coofiarison  with  the  corresponding  FSS  test  was  not  coo- 
siderBd.Also,  in  the  sequential  test  considered  in  [3],  the  number  of 
sanqiles  required  to  terminate  the  test  (the  actual  detection  time)  can 
becomelar^.  We  develop  a  truncated  sequential  test  for  weak  sig¬ 
nal  detection,  based  on  the  series  approximation  of  the  likelihood 
ratiojn  Older  to  avoid  prolonged  test  durations.  At  stage  Nj  the 
cumulative  likelihood  ratio  is  compared  against  a  single  threshold  T, 
and  a  decision  is  reached.  N^  and  the  threshold  T  become  design 
parameters  of  the  truncated  sequential  lest  Berfbrmanoe  evalua¬ 
tions  of  both  unmmeated  and  truncated  sequential  tests,  in  terms  of 
the  Avenge  Sanqile  NumberfASN)  funMkm,  and  the  operating 
characteristic  function,  are  presented.  The  probability  that  an 
untruncated  test  would  terminate  before  the  corresponding  truncated 
test  is  also  presented. 

We  also  consider  a  sequential  test  for  weak  signal  deiectioo  with  M- 
level  quantization  of  observed  data,  based  on  the  series  ^iprosiiiia- 
tion  of  the  likelihood  ratio.  Quantization  for  sequoitial  signal  detec¬ 
tion  for  non-weak  signal  situations  has  been  considerad  in  [  6].  The 
optimal  set  of  quantization  thresholds  is  obtained  by  nriiriiniriiig  a 
weighted  sum  of  the  ASN  under  each  hypothesis.  Ntnneiical  results 
are  presented  for  the  case  when  the  signal  is  deterministic.  Aperfor^ 
mance  cooqiarison  between  the  sequential  test  for  weak  signals 
based  on  unquantized  observed  data,  and  the  sequential  test  using 
quantized  da^  is  also  presented. 
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Abstract 

Ehdsting  maximum-likelihood  sequence  estimation  (MLSE) 
schemes  for  channels  with  memory,  resulting  in  intersymbol 
interference  (ISI),  have  typically  been  implemented  using  the 
Viterbi  algorithm  (VA).  For  memoryless  modulation  schemes 
the  resulting  search  complexity  is  where  M  is  the  al¬ 

phabet  size  and  L  is  the  length  of  the  ISI  span  in  channel 
signaling  intervals.  This  complexity  renders  the  VA  imprac¬ 
tical  for  large  M  and/or  L.  In  this  paper  we  describe  the 
structure  and  properties  of  a  novel  reduced-complexity  iterative 
MLSE  scheme  based  upon  the  expectation-maximization  (EM) 
algorithm.  This  reduced-complexity  iterative  MLSE  scheme  is 
shown  to  have  complexity  0(LM)  at  each  iteration.  The  ap¬ 
proach  provides  an  attractive  alternative  to  the  VA  for  large 
signaling  alphabets  and/or  ISI  span. 

Summary 

Existing  maximum-likelihood  sequence  estimation  (MLSE) 
schemes  for  linear  channels  with  memory,  resulting  in  intersym¬ 
bol  interference  (ISI),  have  t}q>ically  been  implemented  using 
the  Viterbi  algorithm  (VA).  The  VA  provides  a  structured  dy¬ 
namic  programming  search  of  the  imderlying  trellis  defined  by 
the  modulator /channel  cascade.  For  memoryless  modulation 
schemes  the  resulting  search  complexity  is  of  the  order 
where  M  is  the  alphabet  size  and  L  is  the  length  of  the  ISI 
span,  or  the  delay  dispersion,  measured  in  channel  signaling  in¬ 
tervals.  For  modulation  schemes  with  memory,  or  for  coded 
systems  operating  on  ISI  channels,  the  associated  complexity 
can  be  considerably  greater  than  this.  Thus,  it’s  of  some  inter¬ 
est  to  develop  reduced-complexity  MLSE  techniques  and  a  host 
of  research  efforts  have  been  directed  at  this  problem,  all  with 
varying  degrees  of  success. 

In  the  meantime,  a  fair  amoimt  of  work  has  been  done, 
mostly  in  the  statistics  literatture,  in  developing  iterative  so¬ 
lutions  to  a  variety  of  ML  estimation  problems  which  can  be 
cast  in  terms  of  an  incomplete  data  problem.  Here,  the  ob¬ 
servations,  called  the  incomplete  data,  are  related  to  another 
quantity,  called  the  complete  data,  for  which  the  ML  estimation 
problem  is  simpler.  The  estimation-maximization  (EM)  algo¬ 
rithm  [1]  has  been  used  in  such  situations  to  obtmn  an  iterative 
solution  to  the  original  ML  estimation  problem,  based  on  the  in¬ 
complete  data,  which  at  each  iteration  is  no  more  complex  than 
obtaining  a  ML  solution  of  the  much  simpler  problem  based 
on  the  complete  data.  The  EM  algorithm  has  found  extensive 
applications  in  a  variety  of  problem  areas  including:  spectral  es¬ 
timation  [2],  image  reconstruction  [3],  and  image  segmentation 

[4], [5].  Riwently,  the  EM  algorithm  has  been  applied  to  sev¬ 
eral  communications  problems  including:  problems  of  carrier 
recovery  [6]  and  channel  state  estimation!?].  Experience  has 
generally  demonstrated  rapid  convergence  of  the  EM  algorithm 
and,  since  each  iteration  is  reasonably  simple  to  implement,  this 
generally  leads  to  substantial  computational  savings  relative  to 
straightforwsud  ML  procedures. 

In  the  present  paper  we  apply  the  EM  algorithm  to  the 
problem  of  MLSE  on  linear  ISI  channels.  This  results  in  an 
iterative  algorithm  with  substantial  computational  savings  over 


conventional  full-search  MLSE  approaches  since  we  exploit,  at 
each  stage,  the  rather  simple  structure  of  the  ML  solution  based 
upon  the  associated  complete  data.  While  there  are  many  ways 
to  relate  the  observations  to  a  corresponding  complete  data 
quantity,  the  particular  formulation  we  consider  is  suggested  by 
related  work  on  the  ML  parameter  estimation  problem  for  su¬ 
perimposed  signals  [8]  which  is  directly  ^plicable  to  the  MLSE 
problem  treated  here. 

In  this  work  we  provide  the  formal  development  of  the  pro¬ 
posed  reduced-complexity  MLSE  approach  and  describe  some 
of  its  performance  characteristics.  The  relative  complexity  ad¬ 
vantages  of  this  scheme  depends,  of  course,  on  how  many  itera¬ 
tions  are  required  for  acceptable  convergence.  This  is  reUUed 
to  the  resulting  error  probability  and  is  best  determined  by 
simulation.  We  provide  simulation  results  demonstrating  the 
r^id  convergence  properties  of  this  reduced-complexity  iterar 
tive  MLSE  scheme. 
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ABSTRACT 

The  problem  of  estimating  the  symbol  timing  in  wideband  data  com¬ 
munication  signals  is  considered.  Conventional  approaches  to  this 
problem  suffer  from  several  drawbacks,  including  possible  lack  of  con¬ 
sistency  due  to  multiple  extrema  in  the  error  surface,  and  very  slow 
convergence  due  to  exceedingly  sharp  waveform  correlation  functions. 
In  this  work,  sequential  estimation  algorithms  that  alleviate  these 
problems  are  constructed  and  analyzed.  These  algorithms  are  based 
on  two  techniques:  the  use  of  regularization  (i.e.,  prefiltering)  to  pro¬ 
duce  a  consistent  initial  estimate  at  the  expense  of  higher  mean-square 
error;  and  the  coupling  of  recursive  maximum-likelihood  with  this 
consistent  estimator  to  produce  the  desired  goal  -  a  recursive  consis¬ 
tent  and  efficient  estimator. 

1  Introduction  and  Summary 

In  this  work  we  consider  the  problem  of  delay  estimation  in  wideband 
binary  digital  communication  systems.  Let  the  received  signal  be 
modeled  as 

dr(t)  =  a  52  *■«(*-  iT  -  T')dt  +  adw{t)  (1) 

•>o 

where  a  is  the  received  amplitude;  5,-  -  i’th  data  bit  ({6i},>o  is  a 
sequence  of  iid  equiprobable  random  variables  in  {—1, 1});  T  -  dura¬ 
tion  of  symbol  interval;  s(<)  -  code  waveform;  r*  -  unknown  delay, 
T*  e  [0,r);  to(l)  -  standard  Brownian  motion;  <r*  -  channel  noise  in¬ 
tensity.  We  assume  that  s(t)  =  0  for  t  (0,r],  and  that  it  can  be 
written  as  s(t)  =  -  «Tc),  where  are  basic  “chip” 

waveforms,  Nc  is  the  number  of  chips  in  symbol  interval  and  {7;}  is 
a  maximal  length  sequence  [1].  The  problem  addressed  here  is  that 
of  estimating  the  delay  r*  given  the  observation  signal  {r(<),t  >  0}, 
where  the  receiver  has  a  knowledge  of  a,  <t,  and  the  waveform  s(t).  De¬ 
note  by  c(r)  the  periodic  autocorrelation  function  of  s(().  We  assume 
throughout  that  c(r)  has  no  local  maxima  in  the  interval  [-T/2,r/2]. 
Consider  for  a  moment  the  idealized  system  where  the  iid  data  bits 
are  known  to  the  receiver  (or  estimator).  In  this  case  the  maximum 
likelihood  (ML)  estimate  of  r*  based  on  observations  of  the  first  n 
time  units  is  the  value  of  f  that  maximizes  the  output  of  a  filter 
matched  to  the  waveform  £;;*ro  6is(«  -  iT  -  f).  In  cases  of  high  signal 
to  noise  ratio  (or  n  large)  this  maximization  can  be  viewed  as  that  of  a 
“close”  estimate  of  the  aperiodic  autocorrelation  function  of  the  wave¬ 
form  s(t).  Classical  approaches  for  obtaining  f  include  serial  search 
techniques  (see  [2]  and  the  references  therein)  and  gradient  search 
methods.  In  general,  the  aperiodic  autocorrelation  function  of  s(l) 
is  not  guaranteed  to  have  a  unique  maximal  point  even  when  {7i}  is 
a  maximal  length  sequence,  and  thus  gradient  search  algorithms  can 
result  in  a  nonconsistent  estimate  for  r*.  Moreover,  typical  autocor¬ 
relations  of  code  sequences  are  sharply  peaked  and  have  low  sidelobes. 
Therefore,  if  the  initial  guess  f  is  far  from  the  exact  delay,  the  out¬ 
put  of  the  matched  filter  provides  only  little  information  (if  any)  on 
the  direction  and  distance  of  r*,  and  gradient  search  algorithms  can 
wander  in  the  flat  zone  of  the  autocorrelation  function  for  a  long  time 
before  an  initial  lock  is  achieved. 

More  realistic  is  the  situation  where  the  sequence  {h,'}i>o  is  not 
known  to  the  receiver,  and  the  estimation  of  r*  is  performed  due 
to  the  fact  that  knowledge  of  the  delay  is  a  prerequisite  for  reliable 
detection  of  the  bits.  Decision-directed  procedures  for  estimating  the 
delay  in  this  setup  are  described  in  [3]  and  in  the  references  there. 
These  algorithms  also  suffer  from  the  drawbacks  described  above. 

*This  work  wm  supported  in  part  by  a  Wotbon  Postdoctoral  Fellowship,  and 
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In  this  work  we  construct  and  analyze  sequential  detection-estim¬ 
ation  algorithms  that  alleviate  these  problems.  We  suggest  a  recursive 
scheme  for  estimating  r*,  which  is  based  on  maximization  of  a  sm¬ 
oothed  version  of  the  periodic  autocorrelation  function  instead  of  the 
aperiodic  correlation  itself,  and  has  a  good  initial  lock  property  at  the 
expense  of  higher  asymptotic  mean  square  error.  The  construction 
of  this  procedure  is  based  on  the  following  observation.  Fix  some 
0  <  A  <  T/2  and  define  the  symmetric  smoothing  kernel  und 
the  smoothed  waveform  s^(t)  as 

f  0  ti  [-A,A] 

^a(0  =  t  (A-t-t)/A’  te[-A,0]  ,  SA  = 

[(A-0/A»  16  10,  A) 

where  f  *  g  stands  for  the  convolution  of  /(■)  and  g{-).  For  every 
integer  /  >  0,  let  y/(r.  A)  stand  for  the  output  of  a  filter  matched  to 
sa(1  —  it  —  t)  and  driven  by  the  observation  process.  Define 

PA(l,r)  =  f  -  IT  -  T)3(t)dt,  CA  =  C*^A- 

— 00 

It  is  easy  to  verify  that  CA(r)  =  52i=-i  Pa(«.7’)  for  A  >  0  and  r  6 
{—T/2,  T/2].  Since  the  data  bits  are  independent  and  equiprobable, 
the  following  identity  holds 

Et‘  [ll?(r.  A)  +  2yi(r,  A)y/+i(r,  A)  -|-  2y/(r,  A)y/+j(r,  A)] 

=  CaCf  -  F*)  +  <7*Pa(0,0)  +  2a*/»A(1.0).  (2) 

The  right  hand  side  of  (  2)  depends  on  r  only  via  the  periodic  corre¬ 
lation  function  of  the  signal  (or  its  smoothed  version).  This  suggests 
that  an  algorithm  with  good  initial  lock  properties  can  be  constructed 
by  choosing  an  appropriate  A  and  performing  recursive  stochastic 
maximization  (with  respect  to  r)  on  the  left  hand  side  of  (  2).  More¬ 
over,  this  scheme  would  not  utilize  any  decisions  on  the  data  bits. 

In  most  applications,  recursive  maximization  of  the  log-likelihood 
function  results  in  an  algorithm  which,  under  the  assumption  that  it 
converges  to  the  consistent  root,  can  be  made  asymptotically  efficient. 
Based  on  this,  we  construct  a  second  algorithm  by  coupling  a  recur¬ 
sive  ML  scheme  to  the  smoothed  correlation  scheme,  which  serves  as 
a  “guide”  to  the  correct  root  of  the  likelihood  function.  Using  known 
results,  it  can  be  shown  that  the  resulting  algorithm  is  consistent  and 
asymptotically  efficient;  that  is,  the  delay  estimate  converges  w.p.l  to 
r*  (as  n  00),  and  the  asymptotic  mean  square  error  is  the  optimal 
one.  This  technique  has  been  demonstrated  in  the  related  problem  of 
multiuser  amplitude  estimation  in  [4]. 
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Afeatract  •  Linear  leaat  aquarea  esUmatlon  teduiiquea  can 
be  uaed  to  enhance  auppreaaion  of  narrowband  interference  tn 
direct-sequence  apread-apectrum  ^atema.  Nonlinear  techniquea 
for  this  purpose  have  also  been  investigated  recently.  Here,  we 
derive  maxlmuiii-llkellbood  receivers  for  direct-sequence  signal  tn 
Oauaslan  Interference  with  known  second  wder  characterlstlca. 

It  la  Shown  that  if  the  receiver  uses  samples  from  outside  the 
bit  Interval,  then  the  receiver  structure  (called  ML  n  hs 
nonlinear.  The  bit  error  rate  performances  of  these  ML  recelvets 
are  compared  to  those  of  linear  receivers  enqiloylng  one-sided  and 
two-sided  least  squares  estimation  fitters,  for  the  case  of 
Gaussian  autoregressive  tnterferenoe.  It  Is  shown  that  the  ML  n 
receiver  outperforms  the  matched  fitter,  the  one  sided  and  the 
two  sided  transversal  fitters. 

1.  INTRODUCTKm 

Direct-sequence  qnead-qiectrum  systems  offer  an  Inherent 
capablltty  of  rejecting  narrowband  Interference.  Ihlsla 
achieved  by  modulating  the  bit  waveform  with  a  PN  signal  before 
transmission  and  correlating  the  received  slgna]  with  a  repliea 
of  the  PN  slgnaL  In  this  way.  Interfering  signals,  whose 
bandwidths  are  narrow  compared  to  the  spread  signaL  are 
attenuated  fay  the  receiver.  Processing  the  received  signal  prior 
to  correlating  wtth  the  PN  sequence  has  been  employed  to  Snpiove 
the  suppresston  of  nanowband  Interfereoce.  linear  least  squares 
eathnatlon  techniquea  to  estimate  and  sribtract  the  narrowband 
Interfeience  have  been  studied  (!].  Nonlinear  techniques  for 
Interference  suppression  in  spread-spectrum  qrstems  have  been 
Irrvestigated  tn  t2|.  Here,  we  study  the  performance  of  maximum- 
likelihood  receiven  for  direct  sequence  qnead  spectrum  signals 
received  m  Gaussian  interference  wtth  known  second  order 
statistics.  When  the  receiver  (qierates  on  the  observatiorutn 
the  bit  duratfon  only,  the  receiver  Is  the  well  known  linear 
detector  known  as  ^matched  fitter.  When  the  observation 
Interval  extends  outaide  the  bit  mtervaL  the  receiver  structure 
Is  shown  to  be  nonlinear.  The  nonimeartty  arises  not  due  to  the 
modeMr^  of  the  bmsty  chip  sequence  as  as  random  as  m  |2],  but 
due  to  the  uncertainly  on  the  bits  atyacent  to  the  bit  being 
tested. 


a  MAXIMIIM-LIKELIHOOD  RECEIVERS 


We  consider  here  the  performance  of  maxtamm-llkcUhood 
recehrera  for  the  following  problem .  We  shaO  restrict  to  the 
case  where  an  entire  maxtmal  length  PN  code  sequence  is  embedded 
tn  each  bit  (so  called  Shot  PN  sequences).  A  strrdlar  analysis 
can  be  easily  dotK  for  the  case  of  long  PN  sequences.  Let  the 
received  signal  be  processed  by  a  chfo-matChed-Blter  and  sampled 
at  the  <diq>  rate  of  the  PN  sequeiKe  to  yield  (2): 


*k“V‘*k*ik 

where£>9^cr.  Without  loss  of  generality  .the  signal 
strengtli  S  is  mmned  to  be  1.0.  c.  lsthekthchlp<rftheFN 
sequence  with  chip  taterval  %  .  c.  Tork<Oor  k>  L-1  is  taken 
modulo  L.  b.  I  i'Vl.-l)  Is  the  DInmy  Infonnatlon  with  bit 
duration  Tj^vLt  ,  L  Is  the  processing  gain  given  as  the  nuinber  of 
PN  chips  {wmeSaage  bit.  Note  that  b  t  (±1)  for  all  kin 
the  same  bit  interval,  n.  is  a  sequence'W  aero  mean  i.l.d. 

Gaussian  noise  with  kimwn  varlarKe  e* .  L  is  a  sequence  of 
narrowband  mterference  modeled  as  a  am  mean  Gaussian  process 
with  autocovarlaiKe  l^(k).  The  detectum  problem  is: 


allb^over  the  current  bit  (Le.b)B  {  ^  (2) 

Let  V.  B  n.  +  j.  be  the  white  ndae  phis  the  tot^erence  with 
auto&vanmra  R  (m)BO*  Sfin)  R(m).  Let  A  be  the  LxL 
covariance  matrix  of  (v^  The  maxiimnn-llkellhood  detector  for 
the  detection  profdem  m  (2)  Is  given  by : 

^0  O) 

'> . 

2.1  MLDReoeliieranditsBItEmrRale 

Now  consider  the  obaervatum  vector  to  consist  of  the  chips 
correqxmdmg  to  the  bit  under  teat  appended  with  some  chips  from 
the  previous  btt,  Le.  the  receiver  has  to  test  the  present  Ut 
but  uses  observation  aanqdes  fitxn  the  present  bit  Interval  and  a 

part  oftbe  previous  bit  mtervaL  Leti^-b^  Si  where 

S^-tL  .i  .  ,1  Is  the  vector  of  the  last  l  difo 

— wqMM  fi&The  joevlous  bit,  1<L.  The  likelihood  ratio,  X(i), 
and  the  cocte^ondfog  nunrtimim-llkrilhnod  detector  for  the 
detection  problem  m  (2)  la  then  given  by: 

Md- _  1  (4) 


straigfatfofward  calculations  Involving  partltianed  vectors  and 
matrices.  It  can  be  shown  that  the  btt  cnor  probability  for  the 
detector  m  (4)  Is  given  by: 

PrfsmhfSj)  >  T  amWSj)  I  (6) 


where  S  b  a  *  t  •a"  constant 

obtalnea  from  me  entries  m  A  tnatrtx  and  1,  .vector  The 
teatstaUstlcglveoby  (4)  Is  nonlinear  mobagvattms.  The 
receiver  based  on  (4)  will  be  called  ML  n. 

m.  PERFORMANCE  COMPARIStW 
The  bit  error  rate  performances  of  the  ML  I  and  ML  n 
recelvera  are  evaluated  numerically  and  compared  to  the 
performances  of  the  one-alded  and  two-aided  transversal  fitters. 
The  nanowband  Interfeience  Is  modeled  as  a  second  order  aero 


mean  Oauaslan  autoregressive  process  with  known  parameters.  As 
erqiected,  both  the  maxUnum-Mkellhood  receivers  and  tlw 
transveraal  fitters  perfoim  better  sdien  the  power  apectral 
density  Is  peaky.  The  nonlinear  ML  n  receiver  outperfonns  the 
matched  filter  receiver  and  the  one-alded  and  two-aided 
transversal  fitters. 
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Summary 

In  his  celebrated  paper  [1],  Shaimon  stated  that  in  informar 
tion  transmission  over  a  noisy  diannel,  “redundancy  must  be 
introduced  in  the  proper  way  to  combat  the  particular  noise 
structure  involved.  However,  any  redundancy  in  the  source  will 
usually  help  if  it  is  utilized  at  the  receiving  point.  In  particular, 
if  the  source  already  has  a  certain  redundancy  and  no  attempt  is 
made  to  eliminate  it  in  matching  to  the  channel,  this  redundancy 
will  help  combat  noise.” 

This  statement,  though  made  more  than  forty  years  ago, 
forms  the  foundation  of  the  present  work.  The  principal  as¬ 
sumption  here  is  that  the  source  to  be  transmitted  has  a  certain 
redundancy  and  due  to  certain  constraints  (for  extunple,  on  the 
complexity),  the  transmitter  makes  no  attempt  to  “match”  the 
source  to  the  channel.  Instead,  the  source  is  transmitted  directly 
over  the  channel.  The  problem  thus  is  to  design  a  recnver  which 
fully  “utilizes”  the  source  redundancy  to  combat  the  effect  of 
channel  noise. 

It  is  hypothesized  that  the  source  is  in  the  form  of  a  dis¬ 
crete  Markov  chain  and  that  the  channel  is  a  discrete  memo¬ 
ryless  channel.  The  receiver  is  a  maximvm  a  posteriori  (MAP) 
receiver  (detector).  The  redundancy  between  successive  symbols 
of  the  Markov  source  is  used  by  the  MAP  detector  to  provide 
some  protection  against  channel  errors. 

The  above  formulation  has  been  considered  before  by  several 
authors.  The  most  notable  is  the  work  by  Drake  [2],  who  pro¬ 
vided  the  optimal  instantaneous  MAP  decoding  rule  as  well  as 
bounds  on  the  achievable  probability  of  error.  Drake  also  studied 
the  special  case  of  binary  symmetric  Markov  source  and  binary 
symmetric  channel  and  gave  a  necessary  and  sufficient  condition 
for  the  optim^Jity  of  the  singlet  ( “believe- what-you-see”)  decod¬ 
ing  rule.  More  recently,  Sayood  and  Borkenhitgen  [3]  considered 
the  detection  of  a  discrete  Markov  source  over  a  discrete  mem¬ 
oryless  channel  in  a  joint  source-channel  DPCM  image  coding 
system. 

In  this  work,  we  consider  two  variations  of  this  problem:  (i) 
sequence  MAP  detection  which  is  to  determine  the  most  prob¬ 
able  transmitted  sequence  given  an  observed  sequence  and  (ii) 
instantaneous  MAP  detection  which  is  to  determine  the  most 
probable  transmitted  symbol  at  a  particular  time  given  all  the 
observations  up  to  that  time.  The  solution  to  the  first  problem 
results  in  a  “Viterbi-like”  implementation  of  the  MAP  detector 
(with  iMge  delay)  while  the  latter  problem  results  in  a  recursive 
implementation  (with  no  delay).  For  the  special  case  of  binary 
symmetric  Markov  source  and  binary  symmetric  channel,  we  pve 
a  necessary  and  sufficient  condition  (similar  to  Drake)  for  the  op¬ 
timality  of  the  “believe-what-you-see”  sequence  MAP  decoding 
rule  (see  [4]).  Extensive  simulation  results  for  this  special  case 
are  given  in  (4]. 

tThis  work  was  supported  in  part  by  National  Science  Foundation  grants 
NSFD  MIP-88-57311  and  NSFD  CDR-85-00108.  and  in  part  by  NTT  Cor¬ 
poration  and  General  Electric  Co. 


The  solutions  to  the  above  problems  are  I4>plied  to  a  com¬ 
bined  source-channd  coding  problem.  The  source  is  assumed  to 
be  highly  correlated  and  the  source  encoder  is  a  small-block-sise 
vector  quantizer  (VQ).  Since  the  VQ  input  is  correlated  from 
block  to  block,  its  output  is  also  correlated.  This  correlation  is 
referred  to  as  the  “residual”  redundancy  [3].  For  simplicity,  we 
model  the  VQ  output  as  a  discrete  Marl^  source.  The  MAP 
detectors  described  above  are  then  used  for  errw  detection  and 
correction  of  the  VQ  indexes.  Simulation  results  for  this  sys¬ 
tem  on  a  Gauss-Markov  source  are  obtained  and  cmnparisons 
are  made  with  Fiarvardin  and  Vaishampayan’s  channd-optimized 
VQ  (COVQ)  [5, 6]  and  the  ordinary  VQ  designed  fw  a  noiseless 
channel.  Table  1  shows  a  sumnuuy  of  our  simulaticm  results. 
More  extensive  results  can  be  found  in  [4]. 


€ 

VQ 

VQ-l- 

Inst. 

MAP 

VQ-I- 

Seq. 

MAP 

COVQ 

VQ 

VQ-I- 

Inst. 

MAP 

VQ-l- 

Seq. 

MAP 

COVQ 

|5,6] 

t  =  l;R=1.0 

o 

11 

<< 

n 
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7.90 
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7.34 
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0.005 

7.37 

7.61 

7.70 

7.31 

9.08 

9.40 

9.64 

9.15 

0.010 

6.88 

7.30 

7.49 

6.83 

8.21 

8.76 

9.18 

8.37 

0.050 

4.21 

5.65 

6.31 

4.37 

4.31 

5.51 

6.62 

6.23 

0.100 

2.27 

4.01 

5.04 

2.76 

1.95 

3.27 

4.49 

4.65 

Table  1:  SNR  (in  dB)  Performances  of  Combined  Source- 

Channel  Coding  Schemes  Using  MAP  Detection  for  a  Gauss- 

Markov  Source  with  p  =  0.9;  k  s  Dimension;  R  =  Rate 

(Bits/Sample);  e  =  Channel  Bit  Error  Rate. 
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1.  Overview 

We  consider  a  discrete  channel  with  memory  in  which  errors  spread  like 
the  spread  of  a  contagious  disease  through  a  poptilation.  Our  motiva¬ 
tion  is  the  observation  by  Stapper  et.  al.  that  the  Polya-Eggenberger 
(PE)  distribution  is  a  better  ‘^t”  to  the  distribution  of  defects  in 
silicon  than  the  commonly  used  Poisson  distribution.  The  PE  dis¬ 
tribution  is  one  of  the  distributions  generated  by  Polya’s  um  model 
for  the  spread  of  contagion.  We  introduce  a  communication  channd 
with  noise  modeled  by  Polya’s  process.  We  first  present  a  manmum 
likelihood  (ML)  decoding  algorithm;  we  then  show  that  this  channel 
is  in  fact  an  “averaged”  channel  in  the  sense  of  Ahlswede  and  others, 
and  its  capacity  is  zero.  Finally,  we  consider  a  finite-memory  version  of 
the  Polya-contapon  model;  this  channel  is  (unlike  the  original)  ergodic 
with  a  non-zero  capacity. 

2.  Polya-Contagion  Channel 

Consider  a  discrete  binary  additive  communication  channel:  Y,-  = 
X,'  @  Zi,  where  the  random  variables  X;,  Zj,  and  Y;  are,  respectively, 
the  i’th  input,  noise,  and  output  of  the  channel.  We  assume  that  the 
input  and  noise  sequences  are  independent.  The  noise  sequence  {Z,} 
is  generated  according  to  Polya’s  contagion  nm  scheme,  described  as 
follows.  An  um  originally  contains  T  balls,  of  which  R  ate  red  and  S 
are  black.  Let  />  s  R/T  and  a  =  I  -  p  =  S/T.  We  make  successive 
draws  from  the  ura;  after  each  draw,  we  return  to  the  um  1  -f-  A  bails 
of  the  same  color  as  was  just  drawn.  In  our  problem  we  assume  that 
A  >  0  (contapon  case)  and  that  p  <  a.  The  noise  sequence  {Zj}  is 
generated  by  the  draws;  Z,  =  1  if  the  t’th  draw  yields  a  red  ball  and 
Zj  =  0  if  the  t’th  draw  yields  a  black  ball. 

For  an  input  block  X  =  [Xi,...,Xn]  and  an  output  block  Y.  = 
(Yi, . . . ,  Yn],  the  block  transition  probability  of  the  channel  is: 


7’(X  =  £|X  =  i)  = 


r(il)r(f-f-d)  r(f-ni-d) 
r(|)r(f)r(}-t-n) 


where  d  =  dff(]{,x),  the  Hamming  distance  between  ^  £. 

Channel  Properties:  Two  important  properties;  (1)  Station- 
arity:  f>om  equation  (1)  the  noise  {Z,}  forms  an  infinite  sequence  of 
exchangeable  random  variables.  Therefore,  the  noise  process  is  strictly 
stationary.  (2)  Non-Ergodicity:  Let  S„  =  Zj  -t-  Z2  -i-  •  •  •  -I-  Z„.  It  can 
be  shown  that  Z  =  Umn—oo  Sn/n  is  (with  probability  one)  a  random 
variable  drawn  according  to  the  beta  distribution  with  parameters  p/d 
and  o/j.  Thus  the  noise  process  {Zj}  is  not  ergodic  since  its  sample 
average  does  not  converge  to  a  constant. 

Maximum  Likelihood  (ML)  DeccKling;  Suppose  M  code¬ 
words  are  possible  channel  inputs;  of  length  n. 

Given  an  output  y,  ML  decoding  selects  as  its  estimate  of  the  trans¬ 
mitted  codeword  the  that  maximizes  P[Y  =  £  I  X  =  ik)- 

Now  g{d)  =  P(X  =  j  I  X  =  l)  is  strictly  log-convex  in  d  €  [0,  nj 
with  a  unique  minimum  at  do  =  n/2  -t-  (1  -  2p)/2d.  Thus  the  ML 
decoding  algorithm  for  the  channel  is  given  as  follows: 

1.  Given  the  received  vector  y,  compute  dj  =  dniy,!,)  for  each  t. 
Compute  also  dnum  =  max{d,}  and  d„,„  =  min{d,}. 

2.  If  |dmM  -  do|  <  Idmin  -  do|,  map  y  to  the  ij  for  which  dy  =  d^„- 
In  this  case  ML  decoding  <=>  minimum  distance  decoding. 

3-  If  |dnu  -  do|  >  Idmiii  -  do|,  map  y  to  the  ij  for  which  dy  =  do*,. 
In  this  case  ML  decoding  maximum  distance  decrying. 


Averaged  Communication  Channels:  Consider  a  family  of 
discrete  memoryless  channels  parameterized  by  9: 

i  I X  =  i)  =  n  w'i’VYj  =  P.  I  Xj  =  *j) : «  e  e}“ , . 

•si 

A  channel  is  “averaged”  if  its  block  transition  probability  is  the  ex¬ 
pected  value  of  the  block  transition  probability  taken  with  respect  to 
some  distribution  on  9  -  i.e.,  if  it’s  of  the  form 

W<">(X  =  2  I  X  =  i)  =  /  wi''\Y  =  2  I  X  =  A)  dG(9)  (2) 

JB 

where  (0,<r(6),G)  is  a  probability  space  for  the  random  variable  9. 
Note  that  the  averaged  channel  has  memory  and  is  stationary. 

Claim:  The  binary  Polya-contagion  channel  is  an  averaged  chan¬ 
nel;  specifically,  the  Pdya-contagion  channel  represents  the  class  of 
binary  symmetric  channels  with  crossover  probability  9,  where  9  is 
distributed  according  to  the  beta  distribution  with  parameters  p/6 
and  o/S.  Furthermore,  from  the  results  of  Ahlswede  we  can  show  that 
the  capacity  of  this  chaimel  is  zero. 

S.  Finite-Memory  Contagion  Channel 

An  unrealistic  aspect  of  the  Poly-contagjon  channel  is  its  infinite  mem¬ 
ory.  Consider,  for  instance,  the  millionth  ball  drawn  from  Polya’s  um; 
the  very  firit  ball  drawn  from  the  nm  and  the  999,999’th  ball  drawn 
from  the  um  have  the  identical  effect  on  the  outcome  of  the  millionth 
draw.  We  now  consider  a  perhaps  more  realistic  model  for  a  contafpon 
channel  with  finite  memory. 

As  before,  consider  an  nm  with  T  balls,  of  which  A  are  red  and 
S  —  T  —  R  aio  black.  At  the  j’th  draw  we  select  a  ball  from  the  nm 
and  replace  it  with  1  -(■  A  balls  of  the  same  color;  then,  M  draws  later 
-  after  the  (j  -f  Af)’th  draw  -  we  retrieve  from  the  nm  A  balls  of  the 
color  picked  at  time  j.  As  before,  let  Z,  =  1  if  the  i’th  draw  yidds  a  red 
ball  and  Z,-  =  0  if  the  i’th  draw  yidds  a  black  ball.  This  modification 
keeps  the  total  number  of  balls  in  the  um  constant  (T  MA  balls) 
after  an  initialization  period  of  M  draws;  it  also  limits  the  effect  of 
any  draw  to  M  draws  in  the  future. 

For  blocklength  n  <  M  -fl,  the  block  transition  probability  of  this 
new  channd  is  given  by  (1).  For  n  >  Af  -I-  2,  we  obtain: 

•-(x  - .  1 X = •) = 1  _n  _  [^j- 

where  L  =  +  J<)]/ ITfeill  +  «)•  Here,  e*  = 

*•  ®  P>>  k  =  *1  -b  •  •  •  -f  cjv.f  J,  and  s,_i  =  ej_i  -H  •  •  •  -i- 

Claim:  The  new  noise  process  {Zj}  is  a  stationary  ergodic  Markov 
process  of  order  M,  and  thus  the  resulting  channd  is  a  Markov  channd 
with  memory  M .  The  capacity  of  the  channd  is  given  by; 

where  U  =  {n}Zo(p  +  j6)  +  «)]/  nif=,‘(l  +  m6),  and  /ij(*) 

is  the  binary  entropy  function. 

Finally  if  we  let  M  -*  00,  Cm  -*  1-fJ  h»(s)  where  )i(i) 

is  the  beta  pdf  with  parameters  p/d  and  a/h.  This  result  is  identical  to 
ll®n-»oo(I/u)/(X";  Y")  if  X"  and  Y"  are  blocks  of  length  n  joined  by 
the  ori^nal  Polya-contagion  channd  (with  equally  likdy  inputs).  Thus 
as  Af  -•  00,  the  stationary  ergodic  finite-memory  contapon  channd 
converges  in  distribution  to  the  stationary  non-ergodic  Polya  channd, 
but  Cm  does  not  converge  to  Cpoiym  =  0. 
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LINTRODUCnWi 

We  study  the  extraction  of  AM-FM  infonnation  in  signals  of  the  fonn 

s(0  =  o(t)cos{^t)].  (1) 

with  time-vaiying  amplitude  a  and  instantaneous  frequency  o),-  >  ^ ,  using  the 

operator  *i'(s)  =  developed  by  Teager  [1]  and  Kaiser  [2],  shown  to  be 

highly  effective  for  detecting  ^-FM  moduladons  [3].  For  signals  of  the  fonn 
(1),  'l'(s)  “  and  'Ffs  )  »  with  small  approximation  error  under 
realistic  conditions  P].  This  motivates  the  energy  separation  algorithm  (ESA): 

S  2  /  y(i  ).  Si*  =  4'(i  )  / 

n.  ENERGY  OF  FH-TERED  NOISY  AM-FM  SIGNAL 
Define  a  noisy  AM-FM  signal  f=s*n,  with  s  given  by  (1)  and  n  a  zero- 
mean  WSS  Gaussian  random  process  with  autocorrelation  Rit)  and  power 
spectral  density  <h(<u).  Consider  bandpass  filters  with  scaled,  translated 
fifcouencv  lesDonses 

where  H/^tu)  =  tr^l^Hitola)  is  low-pass  and  unit  energy,  and  denote  fa  = 
Sa*  na-s*ga*  n*gtr  An  important  ^tproximation  is  often  used  here; 

Sa“Sa=alGa(o>i)l  cosl^  ^Ga(tUi)].  (3) 

The  error  (zero  for  a  monochromatic  signal)  is  bounded  as  follows.  Fust  define 
Af(ga)=llg  ^  lgo(»)l*  *]>/*.  m  =  [|,  la  (/)!*<*]*/*. 

Theorem  1  -  Let  E,  =  Ir^r  -  r  gl  and  =  sup^t)).  Then 

fr  ^  a"  “max  +  2  Aj(gg)-S(<i).  * 

We  can  also  bound  estimates  of  the  energy  H*: 

?(!„)  =  (a<Ui)2|Co(mi)P,  ?(iaJ  =  (ami2)2|Go(®i)P.  (4) 

Theorem  2  -  Let  =  |y(fg)  -  $(rg)|,  gg= |go(t)(  dr.  Then 

^  3t4in»x)*'8(«i)  [goAaCio)  +  ioAaCgg)  +  TiaAidti  ] 

+  2<i„»x-^<*)tio4i(ig)  +  goAi(gg)  +  2  ggAiCjo)].  4 
Approximations  (3),  (4)  sumtest  minimum  uncertainty  filters  minimize  model 
errors:  ffgCru)  =  V2/W2*  exp  (•  then  (2)  are  Gabor  functions. 

m.  COMPUTING  THE  ESA  IN  NOISE 
Define  the  instantaneous  signal-to-noise  ratio;  Sa(0  =  o^(t)  /  r<r,  where 
Trr  is  the  concentration  of  noise  power  in  the  passband  of  gofr): 

r'<r=  ^  |Co(o>)/Co(<»j)|2  «(«)  dm. 

For  Sa  sufficiently  large  it  can  be  shown  using  (3),  (4)  that: 

E(S,*]  -  m,*  {  1  +  — }  -  m,2  (5) 

(Sa  +2)^ 

E{a2)  -fl2{i+J0iff_tl_}.|Go(mj)|2-a2|Go<m^|2.  (6) 

Sa  (Sa  2) 

VsrfS,^  -  4(Sg-l)/(Sgf2)2  (7) 

Vat(S2]  -4a<((5Sgtl)/Sg2]|Co(mj)|<. 

The  ratios  of  (7),  (8)  to  the  squares  of  (S),  (6)  ate  negligible  for  reasonably  high 
So,  in  which  case  it  may  be  asserted  that  m,-*  ••  m^^  and  a  >•  a2|Ga(mj)|2. 


At  timer  the  maximum  nonnalized  energy  reqtonse  (dotted  hnes  in  Fig.  1) 

>P*(r)  =  max*  { 'P[/5«  (r ))  / 1 G*  (rtfc, )  I  *  } 

is  used  by  the  ESA.  Once  //(m)  is  selected,  tesselaie  the  firequency  axis  with 
translaies/dilates  of  Gg.  Since  the  validity  of  (SHS)  depend  on  fimctions  of 
o/m^  -  in  order  to  maintain  consistent  predicted  performance  across  the  filter 
cAoiuelr  the  error  bound  is  made  constant  by  taking  oii^rqn  ^constant. 

V.  EXAMPLE 

The  multiband  ESA  was  tqiplied  to  the  noisy  chirp  (SNR  s  ISdB)  with  initial 
frequency  2480Hz,  a  3000Hzfsec  sweep  nde,  and  a  2niz  amplitude  modulation, 
shown  in  Fig.  2(a).  The  ESA  results  with  multiband  filtering  are  shown  in 
Kgs.  2(h)  and  2(c),  indicating  excellent  estimates  of  both  AM  and  FM 
components;  these  could  be  improved  even  ftnther  by  post-filtering. 


IV,  MULTIBAND  FILTERING  AND  ESA 
Figure  I  diagrams  a  multiband  energy  operator  -f(0  is  passed  through  multiple 
psssbands  g*  4-»  G*  with  center  (tequencies  pi^ucing  outpuu/*(i). 


Fig.  1.  Mnitiband  filtering  /  ESA  applied  to  noisy  AM-FM  signal. 


Fig.  2.  (a)  noisy  AM  chiip  signal,  (h),  (e)  Computed  AM  and  FM. 
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Information  Theory  and  Radar  Waveform  Design 
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Abstract 

The  use  of  information  theory  to  design  waveforms 
for  the  measurement  of  extended  radar  targets  ex¬ 
hibiting  resonance  phenomena  is  investigate.  The 
target  impulse  response  is  introduced  to  model  tar¬ 
get  scattering  behavior.  Two  radar  waveform  de¬ 
sign  problems  with  constraints  on  waveform  energy 
and  durati<xi  are  then  solved.  In  the  first,  a  deter¬ 
ministic  target  imptilse  respcxise  is  used  to  deagn 
waveform/receiver-filter  pairs  for  the  optimal  detec¬ 
tion  of  extended  targets  in  additive  noise.  In  the 
second,  a  random  target  impulse  response  is  used  to 
design  waveforms  that  maximize  the  mutual  informa¬ 
tion  between  a  target  ensemble  and  the  received  sig¬ 
nal  in  additive  Gaussian  noise.  The  two  solutions  are 
contrasted  to  show  the  difference  between  the  diar- 
acteristics  of  waveforms  for  extended  target  detection 
and  information  extraction.  The  optiinal  target  de¬ 
tection  soluticxi  places  as  mudi  energy  as  possible  in 
the  lar^t  target  scattering  mode  under  the  imposed 
constraints  on  waveform  duration  and  energy.  The 
optimal  information  extraction  solution  distributes 
the  energy  among  the  target  scattering  modes  in  or¬ 
der  to  maximize  the  mutual  information  between  the 
target  ensemble  and  the  received  radar  waveform. 

Sunamary 

The  application  of  information  theory  to  radar  was 
originally  considered  by  Woodward  and  Davies  [1,  2), 
who  used  information  theoretic  ideas  to  formulate  the 
a  posteriori  radar  receiver.  They  also  made  the  obser¬ 
vation  that,  although  radar  system  design  that  max¬ 
imizes  the  signal-to-noise  ratio  at  the  receiver  out¬ 
put  achieves  the  best  target  detection  performance,  it 
does  not  necessarily  provide  the  greatest  “information 
gain”  about  the  target.  By  “information  gain,”  they 
were  referring  to  the  mutual  informati<Hi  between  a 
random  target  parameter  to  be  determined  and  the 
measured  radar  observation  of  the  target.  They  cUd 
not,  however,  pursue  this  idea  further  and  investi¬ 
gate  the  desig^n  of  radar  waveforms  and  receiver  fil¬ 
ters  that  maximize  the  mutual  information  between 
the  observed  target  and  the  radar  measurement  of  the 
target.  In  this  talk,  we  investigate  the  problem  of  op¬ 
timal  waveform  and  receiver  filter  design  for  both  the 
detection  and  information  extraction  in  the  case  of 
extended  radar  targets.  Detailed  treatments  of  these 
problems  can  be  found  in  [3], 

First  we  consider  the  design  of  the  optimal 
waveform/receiver-filter  pair  for  optimal  detection  of 
an  extended  target  with  a  given  target  impulse  re¬ 
sponse  under  constraints  on  waveform  energy  and 
time  duration  in  the  presence  of  wide-sense  station¬ 
ary  additive  noise.  The  receiver  filter  is  seen  to  be  a 
straightforward  generalization  of  the  matched  Alter. 
However,  the  overall  signal-to-noise  ratio  is  depen¬ 
dent  on  the  transmitted  waveform.  The  transmitted 


wavefntn  that  maximizes  the  signal-to  noise  ratio  is 
that  whidi  places  as  mudi  of  the  transmitted  energy 
as  possible  into  the  largest  scattering  mode  of  the 
target  under  the  imposed  duration  and  energy  con¬ 
straints.  This  waveform  can  be  fmmd  by  solving  a 
Fredhdm  integral  equation  whose  kernel  is  a  fimction 
of  the  target  impulse  response  and  the  power  spectral 
density  of  the  additive  noise. 

Next  we  examine  the  problem  of  designing  wave¬ 
forms  that  maximize  the  mutual  information  between 
a  random  extended  target  ensemble  and  the  associ¬ 
ated  radar  measurement  in  the  presence  of  additive 
Gaussian  noise.  Here,  the  randexn  target  ensemble 
is  modeled  by  a  target  impulse  respcsise  that  is  as¬ 
sumed  to  be  a  non-stationary  finite-energy  Gaussian 
random  process  whose  spectral-mean  and  spectral- 
variance  are  known.  We  solve  for  the  family  of  wave¬ 
forms  that  maximize  the  mutual  information  between 
the  target  ensemble  and  the  measurement  under  con¬ 
straints  on  waveform  energy  and  duration.  The  re¬ 
sulting  family  of  optimal  waveforms  can  be  inter¬ 
preted  as  spreading  the  energy  in  the  transmitted 
waveform  under  the  among  the  various  target  scatter¬ 
ing  modes  in  such  a  way  that  the  mutual  information 
is  maximized.  The  sdution  has  the  spectral  form  of 
the  “water-pouring  problem”  in  continuous  waveform 
design,  with  parameters  given  in  terms  of  the  target 
ensemble’s  spectral- variance  and  the  power  spectral 
density  of  the  additive  noise. 

We  then  note  the  physical  interpretatim  of  radar 
waveform  design  in  terms  of  distributing  energy 
among  the  various  scattering  modes  of  the  target  and 
note  the  distinct  difference  between  optimal  detec¬ 
tion  waveforms  and  optimal  information  extractirai 
waveforms  when  view^  in  this  context.  This  serves 
to  illiuninate  the  distinct  differences  between  optimal 
waveforms  for  these  two  tasks  when  making  measure¬ 
ments  of  extended  radar  targets. 
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Ab  Upper  Botmd  of  the  Capacity  of  Hopfleld 
Net  with  PerceptroB  Algorithm 
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Abatract 

Tke  storife  caoacj^  of  Hupiehl  net  ii  dixassnl  liasnl  apua 
ceptroa  aiforitkm.  We  Int  prove  a  tkeorom  of  liaear  separabiliW  ba 
peneptroa  osiag  the  techaiqae  of  convex  analysis.  It  is  then  applied 
to  estimate  tke  mepiorr  ratio  e  for  Hoolleld  net  with  a  aearons.  We 
evaiaate  tke  probability  F(ii,en)  that  any  en  rudouily  generated  pat¬ 
terns  are  the  attractors  of  the  net.  When  lim,^  P(n,en)  =  1,  the  net 
is  capable  to  memorke  en  pattens  in  the  form  of  its  eiiadibria.  The 
maximam  of  sack  e'e  denoted  Igr  C«  is  delaed  as  the  capaciW  of  the 
net.  We  obtain  u  upper  bound  Gq  <  1/po,  where  po  is  tke  solation  of 
thefoDowiaf  .eqaation 

=  H  0<^<I 

Since  >  MX},  we  obtain  Ca  <  4.41. 

Sumnuty 

The  dynamics  of  Hopfleld  net  with  n  aearons  can  be  described  |)y  a 
operator  T :  {±)*  -*  {±}* 

p*=T|**)  =  iHWX-li). 

where  h  is  the  vector  of  thresholds,  W  =  (vi;)  is  the  connection  matrix 
ud  |on  b  operated  componentwise.  A  state  x  u  an  eqnilihrium  state 
when  it  satbfla 

x  =  rx. 

Ghrea  u  arbitrary  set  of  ni  desired  memories  f  |1|,  f  (2),  •  •  • ,  f  |m),  these 

vectors  skonU  indeed  be  stable  vectors.  i.e.. 

11 

=  "’tif;!*)  -  fo| 

J=i 

for  apy  I  =  1.2.--;.m.  i  =  |.2.-  :.n.  Based  np<«  Perceptrtm  Ml'*- 
rithm.  W  can  be  choosen  to  satbh'  the  foifowiag  set  of  iaeqaalities 

ib< 


When  I  =  l,2.  --,»«;  i  =  1,2, ■■■,»}  are  randomly  generated 
patterns,  or  more  precisely,  (,|i)  are  indepeadent  random  variables  tak- 
iag  1  and  -1  with  probability  1/2  each,  we  deflne  by  p(n,m)  the  prob¬ 
ability  that  the  set  of  eqaations  |1)  has  solntwa.  We  say  a  rate  e  b 
achievable  if 

F(«,c»i)  -*1  os  II  -►  00. 

ThecapacDy  G.  of  Hopfleld  net  anderPerceptrunalgcaithmbdeflned 
as  the  sapremnm  of  the  achievalile  rates. 

There  have  been  some  works  retarding  the  u(emoyy  capacities  «rf 
different  ^  pes  nader  different  learning  alg<«ithms  for  Hopfleld  networiu. 
Among  them  Gardner  (1988)  dbcnssed  another  type  of  capacity  for  tke 
Hopfleld  net  with  Perceptroa  algorithm  and  olAained  a  Upper  bound 
(7j,  <  2.  In  tkb  work  we  flrst  prtwe  a  theorem  of  linear  separability  for 
perceptron  nsing  the  technique  of  roam  aaaiysb.  It  b  then  applied  to 
estimate  the  capacity  C,  for  Hopfleld  net.  we  obtain  an  apper  bound 

C,  <  l/po, 

where  pg  b  the  solution  of  the  following  ninatiba 

since  >  0.227  .  we  get  (/'»  <  4.41. 

Uib  work  b  snpportted  by  Chinese  Tian  Yaaa  Fonadatioa. 
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Corrective  Memory  by  a  Synmetric  Sparsely  Encoded  Network 

Y.  Baram  Department  of  Computer  Saienae,  Tedhnian,  Israel  Institute  of  Technology,  Haifa  3Z000, 
lerael. 

A  neural  network  that  retrieves  stored  binary  vectors,  when  probed  by  possibly  corrupted  versions 
of  them,  is  presented.  It  employs  sparse  ternary  internal  coding  and  autocorrelation  (Hebbian) 
storage.  It  is  symmetrically  structured  and,  consequently,  can  be  folded  into  a  feedback 
configuration.  Bounds  on  the  network  parameters  are  derived  from  probabilistic  considerations. 

The  as^ptotic  storage  capacity  is  shown  to  be  arbitrarily  close  to  linear  in  the  network  size, 
which  is  exponential  in  the  input  dimension.  The  performance  of  a  finite-size  symmetric  network 
is  examined  by  simulation  and  found  to  be  substantially  higher  than  that  of  Kanerva's  seminal 
model,  operating  as  a  content  addressable  memory. 
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Strong  Universal  Consistency  of  Neural  Network  Classifiers 
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ABSTRACT  In  statistical  pattern  recognition  a  classifier  is  cdled 
universally  consistent  if  its  error  probability  converges  to  the 
Bayes-risk  as  the  size  of  the  training  data  grows,  for  all  possi¬ 
ble  distributions  of  the  random  variable  pair  of  the  observation 
vector  and  its  class.  We  prove  that  if  a  one  layered  neural  net¬ 
work  is  trained  to  minimize  the  empirical  risk  on  the  training  data, 
then  it  results  in  a  universally  consistent  classifier  if  the  number 
of  nodes  k  is  chosen  such  that  k  —*  oo  and  iblog(n)/n  — >  0  as 
the  size  of  the  training  data  n  grows  to  infinity.  We  show  that 
if  certain  smoothness  conditions  on  the  distribution  are  satisfied, 
then  by  choosing  k  —  0( y/n/  log(n),  the  exponent  in  the  rate  of 
convergence  does  not  depend  on  the  dimension. 

I.  INTRODUCTION 


The  pattern  classification  problem  can  be  formulated  as  fol¬ 
lows:  Let  the  random  variable  pair  (X,  K)  take  its  values  from 
H’*  X  {0, 1}.  X  €  is  called  the  observation  (or  feature)  vector, 
while  Y  €  {0|1}  is  its  class.  Observing  X  one  wants  to  guess 
the  value  of  F  by  a  classification  rule  g  :  ►  {0, 1}  such  that 

the  error  probability  Pr{ff(A’)  ^  Y]  be  small.  The  best  possible 
classification  rule  is  given  by 


0  if  Poix)  >  1/2 
1  otherwise 


where  Po(*)  =  Pr{y'  =  0|Jf  s=  x}  is  the  a  posteriori  probability 
of  class  0.  The  minimal  error  probability  i*  =  Pr{y*(j>f)  ^  V} 
is  called  the  Bayes-risk.  In  practice,  the  a  posteriori  probabilities 
are  rarely  known,  instead,  a  training  sequence 


(n={{Xl,Yl),{X2,Y3),...,{X„,Yr.))  (1) 

is  available,  where  (X,  V),  (Xj  ,Y\),.. . ,  (Xn,  Yn)  are  independent, 
identically  distributed  (i.i.d.)  random  variable  pairs.  Now,  one 
can  estimate  Y  by  ffn(X)  =  gn{X,^n),  a  measurable  function  of 
the  observation  and  the  training  sequence.  The  error  probability 
of  gn  is  denoted  by 


L(9n)  =  Pr{ff„(X)  ^ 


A  sequence  of  rules  is  strongly  universally  consistent  if 

lim  L(g„)  =  L*  with  probability  one  (2) 

n-"00 


for  any  distribution  of  (X,  Y). 

A  classification  rule  realized  by  a  feedforward  neural  network 
with  one  (hidden)  layer  can  be  expressed  as 


where 


9{x,9k) 

fix, Ok) 


(  0  if/(x,»*)>0 

^  1  otherwise. 


^Ci<r(o,x  +  b,)  Co. 


(3) 

(4) 


Here  k  is  the  number  of  nodes  (hidden  neurons),  and  Bi,  = 
(ai,..  .,ak,bi, ...  ,bk,co,Ci,. ..  ,Ck)  is  the  parameter  vector  of  the 
neural  network  (oi  e  'Jl‘‘,co,bi,c,  6  H.i  =  l,...,k).  Here  we 
assume  that  the  sigmoid  a  is  the  step  function 


<r(r) 


r  1  if  X  >  0 

^  — 1  if  X  <  0. 


Our  goal  is  to  choose  the  number  of  nodes  k  and  set  the  param¬ 
eters  such  that  the  error  probability  Pr{y(X,  0k)  ^  V}  be  small. 
Our  strategy  is  to  minimize  the  empirical  error,  in  other  words,  we 
choose  a  parameter  vector  81  „  for  which  the  corresponding  clas¬ 
sification  rule  =  gix,8l  „)  commits  the  minimum  number 

of  errors  on  the  training  sequence: 


=  min  L{g{;  0*)),  (5) 


where 

^  n 

H9i;Bk))  =  -^h9(Xi.ek)i‘Yj} 
j=i 

is  the  empirical  error  probability  of  the  classification  rule  g(x,Bk). 
(lyy  denotes  the  indicator  of  an  event  A). 

Our  main  result  is  that  such  parameter  selection  has  very  good 
properties.  In  particular,  we  can  show  the  following: 


Theorem  1  If  the  number  of  nodes  k  is  chosen  to  satisfy 


k  —*  oo 


(6) 


and 


as  n 


oo,  then 


*lo8(")  _  0 

n 


lim  tvith  probability  oncy 

n-*oo 


(7) 


regardless  of  the  distribution,  that  is,  the  sequence  of  rules  {yj  „} 
IS  strongly  universally  consistent. 


The  proof  is  based  on  the  celebrated  result  of  Cybenko  and  Hornik, 
Stinchcombe  and  White,  that  functions  realized  by  networks  with 
one  hidden  layer  are  dense  in  the  class  of  continuous  functions, 
and  the  Vapnik-Chervonenkis  inequality. 

Applying  Barron’s  results  we  can  estimate  the  rate  of  conver¬ 
gence  for  smooth  distributions: 

Theorem  2  Assume  that  there  is  a  compact  set  A  C  such 
that  Pr{X  e  A)  =  1  and  the  Fourier  transform  A(w)  of  Po{x) 
satisfies  |u>||Po(w)|dw  <  oo.  Then 

If,  further,  the  number  of  neurons  is  chosen  to  be  k  = 
0(^n/dlog(Ti)),  then 
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Sample  Sise  Requirements  of  Feedforward  Neural  Network  Pattern  Classifiers* 

Terrence  L.  Fine  and  Michael  J.  Timnon 
E&TC388 

Cornell  University  School  of  Electrical  Engineering 
Ithaca,  NY  14853 


We  investigate  the  tradeoffs  among  network  complexity, 
training  set  size,  and  statistical  performance  of  feedforward 
neural  networks. 

Nets,  labeled  as  functions  r;  ;  — »  {0, 1},  classify  input 

points  2,  €  as  either  type  0  or  type  1.  The  architecture 
of  all  nets  under  consideration  is  whose  size  is  gauged  by 
its  VC  dimension  v,  the  size  of  the  Ivgest  set  of  points  the 
ardiitecture  can  classify  in  any  desired  way.  Nets  t)  €  M"  are 
chosen  on  the  basis  of  a  training  set  T  =  {(x;,  .  These  n 

samples  are  i.i.d.  according  to  an  unknown  probability  law  P. 
Performance  of  a  network  is  measured  by  the  error  probability 

£(v)  =  P{'I{£.)  *)■ 

and  a  good  (perhaps  not  unique)  net  in  the  architecture  is 
=  argminflv). 

To  select  a  net  using  the  training  set  we  employ  the  empirical 
error  frequency 

1  " 

sustained  by  7  on  the  training  set  T,  A  good  choice  for  a 
classifier  is  then 

7*  =  arg  min  1/7(7)- 

>T€Af 


which  is  insignificant  compared  to  other  factors  entering  expo¬ 
nentially,  while  the  lower  bound  becomes  trivial. 

The  number  obtained  via  VC  theory  represents  a  suffi¬ 
cient  condition  on  sample  size  to  obtain  reliable  classification. 
To  supplement  this  we  have  obtained  a  lower  boimd  or  a  nec¬ 
essary  condition  on  the  training  set  size  needed  to  obtain  reli¬ 
able  classification  by  examining  in  detail  the  error  terms  for  a 
perceptron  under  multivariate  normal  input.  Suppose  the  ob¬ 
served  data  X  has  equal  prior  probability  of  being  N{no,Id)  or 
N(fii,Id),  and  that  n/2  correctly  classified  samples  are  gath¬ 
ered  from  each  prior.  When  the  means  are  known,  the  classifier 
7®  minimizing  error  probability  is 

7“(£)  =  1/2  -  sgn((x  -  (po  +/ii)/2)^(po  -  /ii))/2, 

and  f(7®)  =  $(-A/2)  where  A*  =  (po  -pi)^(po  -  Pi )  and  $ 
is  the  distribution  of  JV(0. 1 ).  The  empirically  chosen  classifier 
when  the  means  are  unknown  is  formed  by  substituting  the 
sample  means  under  each  hypothesis,  xo  and  X| ,  into  tf°: 

v'li)  =  1/2  -  sgn((x  -  (xo  -t-  xi  )/2)^(io  -  *1  ))/2. 

f(7*)  is  hard  to  find  (see  (1),  sec.  6.6),  but  it  can  be  approxi¬ 
mated  using  arguments  in  the  spirit  of  Ratidys  [4].  The  condi¬ 
tion  necessary  for  reliable  classification  becomes 

f(7*)  -  f(7“)  »  «•(  -( A/2)(l  -I-  4d/nA*)-'/*)  -  ♦(-A/2)  <  «, 

uniformly  over  all  values  of  A.  .Analysis  reveals  that  meeting 
the  above  condition  recpiires 


By  definition  £(7*)  >  f(7®),  and  in  fact  arguments  in  Vapnik 
(5)  can  be  adapted  to  yield  the  VC  upper  bound 

A{f(7* )  -  >()< 

c! 

This  inequality  .shows  that  sample  sizes  of  about 


16u  6 

—  log(-) 


are  sufficient  to  obtain  a  small  probability  of  a  discrepancy 
of  more  than  e  between  £(7*)  and  £{r]°).  If  for  purposes  of 
illustration  we  take  e  =  .l,t;  =  50.  we  find  that  ric  =  328000, 
which  disagrees  by  orders  of  magnitude  with  the  experience  of 
practitioners  who  train  such  low-complexity  networks  (about 
50  connections). 

One  way  to  close  this  gap  between  theoretical  guidelines 
and  practical  experience  is  to  obtain  a  tighter  upper  bound. 
One  source  of  the  discrepancy  is  the  union  bound  employed 
in  the  VC  development,  a  tighter  version  of  which  is  given  by 
Numan  and  Wynn  (3): 

£  P{A0-  PUin  Ai)<P(  (J  Ai)< 

t<i<N  l<i<}<N  i<i<N 


lower  than  the  VC  sufficient  condition  by  a  factor  of  order 
just  log(l/£).  For  this  special  case,  it  also  improves  on  the 
necessary  condition  n  >  v/32t  obtained  by  Baum  and  Haussler 
(6).  This  result  confirms  that  the  VC  bound  is  relatively  tight, 
and  demonstrates  that  practitioners  are  overly  optimistic  when 
using  small  sample  sizes. 
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However,  we  have  shown  that  these  pairwise  corrections  re¬ 
duce  the  upper  bound  by  at  most  a  multiplicative  factor  of  n. 
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We  address  the  problem  of  computing  the  COMPARISON  and 
ADDITION  functions  of  two  n-bit  numbers  using  circuits  of  (non¬ 
monotone)  MAJORITY  gates. 

Given  n  Boolean  variables  €  {—1,1},  a  non- monotone 

MAJORITY  gate  (in  the  variables  x,)  is  a  Boolean  function  whose 
value  is  the  sign  of  where  each  c,  is  either  1  or  —1. 

We  construct  an  explicit  sparse  polynomial  whose  sign  computes 
the  COMPARISON  function  of  two  integers.  Similar  polynomials 
are  constructed  for  computing  all  the  bits  of  the  summation  of 
the  two  integers.  This  supplies  explicit  constructions  of  depth-2 
polynomial-size  circuits  computing  these  functions,  w^ich  use  only 
non-monotone  MAJORITY  gates.  These  constructions  are  opti¬ 
mal  in  terms  of  the  depth  and  can  be  used  to  obtain  the  best  known 
explicit  constructions  of  MAJORITY  circuits  for  other  functions 
like  the  product  of  two  n-bit  numbers  and  the  maximum  of  n 
n-bit  numbers  (see  [3]  and  [6]).  A  crucial  ingredient  in  our  ap¬ 
proach  is  the  construction  of  a  discrete  version  of  a  sparse  “delta 
polynomial” — one  that  has  a  large  absolute  value  for  a  single  as¬ 
signment  and  extremely  small  absolute  values  for  all  other  assign¬ 
ments.  We  construct  sparse  delta  polynomials  using  generator 
matrices  of  certain  linear  block  codes. 

In  the  rest  of  this  summary  we  sketch  the  ideas  related  to  the 
construction  for  the  COMPARISON  function.  More  details  and 
related  results  appear  in  [1]. 

Let  X  =  (i„,i„_i,...,ii)  and  Y  =  (vn,  J/n-i,  ■  •  • ,  J/i)  be  two  vec¬ 
tors  in  {1,-1}".  Let  a  and  b  be  the  integers  that  correspond  to 
X  and  y,  respectively.  Since  our  convention  is  that  a  logical  0  is 
represented  by  c  .d  a  logical  1  is  represented  by  —1  this  means 
that  a  =  Er=i  ^2-'  and  6  =  Er=i  ^2-'.  The  COMPAR¬ 
ISON  function,  C(X,Y),  is  the  Boolean  function  which  is  -1  iff 
a  >  b. 

Next  we  introduce  the  concept  of  a  sparse  delta  polynomial.  A 
polynomial  is  called  sparse  if  it  is  the  sum  of  at  most  mono¬ 
mials.  For  a  vector  <  =  {fi,...,rn}i  where  <,  €  {  —  1,1},  and 
for  a  positive  real  c,  we  call  a  polynomial  P(i, , . . . ,  x„)  a  if'  ila 
polynomial  for  r  and  c  if  there  are  two  positive  constants  d  and  e 
satisfying  ^  >  c  such  that: 

(i)  P(<i,...,f„)  =dand 

(ii)  For  all  (xi, . . . ,  x„)  e  {-1, 1}"  which  satisfies  (x,, . . . ,  x„)  /  f, 
|P(i,,...,x„)i  <  c. 

Our  construction  of  delta  polynomials  can  be  obtained  by  using 
linear  error-correcting  codes  over  GF{2)  with  length  which  is  poly¬ 
nomial  in  the  dimension  and  with  the  property  that  the  Hamming 
weight  of  any  non-zero  codeword  is  sufficiently  close  to  half  the 
length.  Let  A  =  (<ii;)i<i<n.i<j<(  be  the  generator  0,1-matrix  of 


a  linear  error-correcting  code  of  length  t  and  dimension  n,  and 
suppose  that  the  Hamming  weight  of  each  non-zero  codeword  is 
between  (1  —  f)^  and  (1  -b  e)^.  Let  P4  =  Py((i), . .  .,Xn)  be  the 
polynomial  defined  by  P,,(xi, . . . ,  i„)  =  Clearly 

PA(i, . . .,  1)  =  f,  and  it  is  not  difficult  to  check  that  for  every 
(xi,...,z„)  6  {-1,1}"  which  is  not  (1,. .  .,1),  \Pa(xi,  . . .  ,Xn)\  < 
et,  since  Pa{xi,.  . x„)  is  precisely  the  difference  between  the  num¬ 
ber  of  O’s  and  the  number  of  I’s  in  the  codeword  defined  by  the 
sum  (in  GF{2))  of  all  rows  i  of  A  such  that  x,  =  —1.  Linear 
codes  as  above  with  length  t  polynomizJ  in  the  dimension  n  and 
with  e  inverse  polynomial  in  the  dimension  are  the  duals  of  BCH 
codes  [4],  as  well  as  other  more  recent  constructions  that  have 
applications  in  derandomization  of  algorithms  [2,  5]. 

The  following  theorem  gives  the  construction  for  COMPARISON 
which  is  based  on  a  sparse  delta  polynomial  with  n  variables  de¬ 
noted  by  P(  ). 

Theorem  1  Let  m*(A,  Y)  =  P(x„y„,  x„_iy„-i,  ■  •  ■ ,  X(t+,yit+i). 
Define  C(A,  Y)  =  njo(A,  Y)  +  E"=,(y.  -  i.)m.(A,  Y). 

Then  CiX,  Y)  =  sign(-C{X,  Y)). 
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Abstract 

The  capacity  Cb  of  two  Layer  ( .Y  —  2/-  —  ] )  feed¬ 
forward  neural  networks  is  sliowii  to  satisfy  the 
relation  <C  <  O(H^).  Here  .V  — 2t—  I 

stands  for  the  networks  with  A'  input  units,  '21. 
hidden  units  and  one  output  unit.  H'  is  the 
total  number  of  weights  of  the  networks.  The 
weights  take  only  binary  values  and  the  hidden 
units  have  integer  thresholds. 

Summary 

The  motivation  for  this  work  comes  front  hard¬ 
ware  implementation  of  neural  networks.  When 
weights  of  neural  networks  are  iin])lemented, 
both  their  accuracy  and  magnitude  have  to  be 
limited.  Then  a  natural  question  to  ask  is 
whether  the  learning  capability  of  neural  net¬ 
works  will  thus  be  affected. 

Learning  capability  of  neural  networks  can  be 
characterized  by  their  information  capacity(2], 
which  is  defined  as  the  total  number  of  di¬ 
chotomies  implementable  by  a  class  of  networks 
of  the  same  architecture.  The  capacity  C  of  two 
layer  N  -  L  -  I  feedforward  networks  with  ana¬ 
log  weights  has  been  shown  to  satisfy  the  relation 
Olw)  <C<  0(WlnL)[l].  Here  \v,  L  and  N 
are  the  total  number  of  weights,  the  number  of 
hidden  units  and  the  input  dimension,  respec¬ 
tively.  It  remains  an  open  question,  however, 
what  the  capacity  of  multilayer  networks  would 
be  if  their  weights  can  only  take  discrete  values. 
In  this  work  we  answer  this  question  by  evaluat¬ 
ing  the  capacity  of  two  layer  A’  -  21.  -  I  feedfor¬ 
ward  networks  (N  inputs.  21.  hidden  units  and  1 
output)  with  binary  weights  and  integer  thresh¬ 
olds  for  the  hidden  units. 


Specifically,  upper  and  lower  bounds  for  the  ca¬ 
pacity  Cb  of  such  networks  are  established  in 
two  steps.  First,  the  statistical  capacity[3]  of  a 
specifically  constructed  network  is  evaluated  and 
found  to  be  0{i^),  where  W  is  the  total  num¬ 
ber  of  weights  of  the  network.  It  is  used  as  a 
lower  bound  for  the  capacity  Cb-  Then  an  upper 
bound  is  obtained  through  a  simple  counting  ar¬ 
gument,  and  shown  to  be  0{W).  Therefore,  we 
have  0(7^)  <Cb<  0{W). 

This  result  shows  that  reducing  the  analog 
weights  to  only  binary  values,  the  capacity  of 
two-layer  networks  is  reduced  by  at  most  a  log 
factor.  This  is  consistent  to  what  has  been 
found  for  a  single  neuron  with  binary  weights{4]. 
Therefore,  even  with  binary  weights  only,  multi¬ 
layer  neural  networks  still  have  strong  learning 
capability. 
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Abstract 

The  construction  of  fixed-rate  vector  quantizers  based  on  entropy- 
coded  scalar  quantizers  has  been  suggested  in  [1].  In  [2],  a  particular 
case  of  such  quantizers  based  on  prefix-coded  scalar  quantizers  was 
suggested  and  a  very  simple  binary  encoding  scheme  was  presented  as 
well  as  several  reduced-complexity  search  methods.  In  this  paper,  we 
use  arithmetic  coding  as  the  binary  encoding  scheme  and  show  that 
good  performance  and  low  complexity  are  attained. 

Summary 

The  basic  idea  is  as  follows  :  Let  £  =  (91,92, be  a 
vector  of  levels  such  that  91  <  qj  <  <  9™.  Let  /,  be  an 

encoding  scheme  that  encodes  n-dimensionsJ  vectors  from  G"  = 
{91, 9j,  ■  •  • , 9m}"  into  binary  strings,  and  for  any  such  vector  y,  let 
t(y)  =  /(/«(j/))  be  the  length  of  the  corresponding  binary  string. 
We  define  the  quantizer  codebook  as  C  =  {jf  €  Q”  '■  < 

nr),  where  r  is  the  desired  rate.  We  assume  that  the  produced 
binary  string  can  be  completed  to  length  [nrj  with  properly- 
chosen,  dummy  bits  without  affecting  the  decodability.  In  such  a 
case,  using  /«,  C  can  be  encoded  with  [nrJ  bits,  i.e.  with  a  rate 
of  [nrJ In  far  bits  per  dimension. 

With  proper  choices  of  /«,  one  can  easily  see  that  the  quantizers 
in  (1]  imd  (2]  fit  the  above  description. 

An  important  issue  here  is  quantizing  a  given  n-dimensional, 
source  vector  j,  i.e.  finding  a  vector  in  C  nearest  to  x.  The  com¬ 
plexity  of  this  task  is  dependent  on  the  structure  of  C  which  is 
determined  by  /,.  In  the  case  of  [1,  2],  the  condition  l(y)  <  nr 
is  replaced  by  a  condition  of  the  form  53;  /(y;)  <  L,  where  !(•) 
is  a  length  function  defined  on  Q  that  assumes  positive  integer 
values.  In  this  case,  it  was  shown  in  [1]  that  quantization  can  be 
performed  using  a  dynamiic  progrsunming  search.  In  [2],  several 
reduced-complexity,  suboptimal  search  methods  have  been  pro¬ 
posed.  In  puticular,  the  Lagrange-multiplier-based  (LM-based) 
method  of  [2]  can  be  applied  to  more  general  cases  like  the  case 
of  arithmetic  coding,  discussed  below. 

Now,  we  consider  the  case  when  /,  is  an  arithmetic  encoding 
rule.  The  use  of  arithmetic  codes  is  motivated  by  their  simplicity 
and  the  fact  that  arithmetic  coding  has  a  rate  very  close  to  the 
entropy  of  the  encoded  source. 

The  major  problem  with  arithmetic  coding  is  that  it  is  hard  to 
precisely  calculate  the  number  of  bits  produced  unless  the  actual 
encoding  is  done.  Therefore,  we  modify  the  definition  of  C  above 
so  that  only  a  simple  upper  bound  to  f(£)  is  constrained.  For 
this  purpose,  we  use  Pasco’s  arithmetic  codes  [3],  which  permit  a 
tighter  upper  bound  to  l(y). 

In  detail,  let  p(qj)  be  a  J-bit,  positive  fraction  correspond¬ 
ing  to  qj.  For  a  given  positive  integer  K,  an  arithmetic  code 
based  on  Pasco’s  method  [3]  will  encode  in  such  a  way  that 
t{y)<\-Ziyog,P{yi)+nV(K)]  <  I- log,^y;)-l-l-f-nV(A)J 
bits  where  V(K)  =  -log,(l  -  2‘"'^).  The  quantity  nV'(A’)  can 
be  made  as  small  as  desired  by  increasing  K;  however,  this  will 
increase  the  complexity  of  the  arithmetic  code,  since  K  together 
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with  J  determine  the  precision  of  the  arithmetic  operations  in¬ 
volved.  Therefore,  we  replace  the  condition  t{y)  <  nr  in  the  def¬ 
inition  of  C  by  the  conation  [— 53,  logj  ^y«)  +  1  +  < 

nr.  We  call  the  resulting  systems,  arithmetic-encoded,  block- 
constrained  quantizers  (ae-BCQ). 

The  major  disadvantage  of  Pasco’s  implementation  is  that  it 
uses  multiple-precision  arithmetic.  We  solve  this  problem  by 
showing  that  Pasco’s  codes  can  be  implemented  with  fixed-point 
arithmetic  using  (J  -t-  K  -I-  l)-bit  registers,  in  a  way  very  similar 
to  the  implementation  of  Jones  [4]. 

Table  1  shows  the  performance  of  ae-BCQ  for  various  values  of 
n  and  r  obtained  by  simulation  for  IID  Gaussirm  and  Laplacian 
sources  and  mean-squared-error  distortion.  The  systems  are  opti¬ 
mized  using  variations  of  the  methods  described  in  [1,  2]  and  the 
initial  parameters  are  based  on  the  optimal,  entropy-constr2uned, 
scalar  quantizer  (ECSQ)  of  the  given  rate.  The  quantization  is 
performed  using  the  LM-based  method  of  [2].  From  the  table, 
we  also  see  that  the  SNR  of  ae-BCQ  with  n  =  128  is  compara¬ 
ble  to  the  SNR  of  the  pe-BCQ  scheme  of  [2]  with  n  =  192  (a 
little  better  at  low  rates,  a  little  worse  at  high  rates).  The  latter 
has  much  higher  quantization  complexity,  and  somewhat  less  en¬ 
coding  complexity.  Also,  with  very  little  loss  in  SNR,  LM-based 
method  can  be  used  with  pe-BCQ  with  the  same  complexity  ad- 
V2mtages  [2].  The  originrd  SVQ  of  [1]  with  n  =  32  has  almost  the 
same  search  complexity  as  pe-BCQ  with  n  =  192  and  significantly 
larger  encoding  complexity. 
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Table  1;  SNR  (dB)  for  ae-BCQ  and  other  methods 
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ABSTRACT 

RepreaentationB  and  statistical  properties  of  the  process 
S  defined  by 

2n+l  *  A(Zn  +  f*), 

where 

A(u)  u  —  6  ■  «i^(u)  +  m 

are  given,  when  is  a  Gaussian  white  noise.  The  pro¬ 
cess  S  represents  the  binary  quantiser  error  in  a  modei  for 
(sin^  loop)  Sigma  Delta  modulation,  see  [3, 6].  The  exls- 
tence  and  uniqueness  of  an  Invariant  probahility  measure, 
ergodicity  properties  as  well  as  the  existence  of  moments 
w.r.t.  the  invariant  probability  are  proved  using  Markov 
process  theory.  Considering  7  as  a  random  perturbation 
of  the  orbits  of 

“  A(s„) 

the  structure  of  the  power  spectrum  of  the  quantiser  er¬ 
ror  is  studied  approximately  for  small  values  of  the  white 
noise  variance  using  the  deterministic  signal  Sn  under  a 
uniform  invariant  distribution. 

Summary 

The  binary  quantiser  error  process  7  :m  {g^)  at  defined 
above  is  a  teal  valued,  discrete  iime  Markov  process.  An 
Invariant  probability  measure  denoted  iqr  rr,  te  a  prob¬ 
ability  measure  satisfying  x(A)  <■  /|,rr(dv)P'(«;  A),  for 
any  (Botel)  set  A,  where  P*  (v;  A)  designates  the  one  step 
transition  probability  of  the  binary  quantiser  error.Then 
we  have; 

Let  A(u)  +  &  6e  a  Gaussian  white  noise 

with  the  Mrianoe  o’  and  let  Un+t  •  A(C/„)-f  Then  the 
binary  fuontuer  error,  7^  m  A((/,i)  has  a  unique  inusriant 
probability  if  and  only  |  m  |<  b. 

The  process 7 Is  thus potuioely  recurrent  Wealsoshow 
that  if  b  -  I  m  |>  then  the  process  7  has  moments  of 
all  orders  w.r.t.  tne  invariant  probability  measure.  The 
exponential  moments  £,(e'''*"'|  do  not  exist  for  large  n, 
which  shows  that  the  Invariant  probahility  s  dillers  sub¬ 
stantially  from  the  normal  distribution. 

A  study  of  the  detail  in  (1)  and  the  lep resen tatfons  In 
[3,  ^  yields  the  stationary  cbarseteristle  functioa  9o{ie) 
of  the  binary  quantiser  error  process  7  as 

B  [e^-]  -  e-^-“  •  ■  e(u,) 

with 

0(w) e(’5^”. 

where  j  and  ip{x\  o)  'Ti^*'**^***-  Evidently 

this  shows  that  7n  can  In  the  stationary  state  be  split  into 
a  sum  of  two  Independent  random  variables, 

g„  +  m  +  On  (Fine’o  Deoompotition), 

where  ('granular  notes')  Is  uniformly  distributed  in 
(-b,-hl^  and  Ok  ('slope  overload  notes")  has  the  distri¬ 
bution  determined  by  8(w)  and  msons  equality  in 
distribution.  Another  Important  formula  here  is  the  fol¬ 
lowing  es^Ueit  tohMonot  eH+t  A(ei)  due  to  (3,  Thm.l, 
eq.  (3.8),  p.  486]; 

el^  ■■  3b<x  -♦-  -I-  m  -  b,  (Oray't  formoU), 


where  (x)  x  —  (xj  is  the  fractional  part  of  z,  t 

and  P  :=  Here  we  imist  sasume  |  m  |<  b. 
Using  these  decompositions  of  7n  and  Hermite  exparr- 
tioni  of  tire  nonUnearitiet  involved  we  show  that  the 
poser  spectrum  of  the  binary  quantiser  error  hai:  distinct 
peaks  at  integer  multiples  of  (&equenr>  »oled  to 

{— l/3,l/3])  for  small  values  of  o.  This  follows  by  inves¬ 
tigating  the  autocorrelation 

fh— 1 

rt(n)  Bt*«-^l  =  -5ZBW(®‘  +  &)«ol-t- 

i»0 

-fn  -m-B  po)  -i-  E  [^] . 

Here  we  calculate  using  ej,  which  is  a  function  of  co 
in  view  of  Gray’s  formula  and  since  e^  pt  -f  m,  and 
obtain  after  conditioning  on  eS,  as  the  Gaussian  noise  is 
independent  of  the  sequence  el  that 

B  IQ  (el  -t-  C.)  •  »)1  =  » •  £•.  [«•/  (el/v/ar)  ■  go] , 

where  Bf,  demtes  the  expectation  with  respect  to  the 
uniform  ^  Invariant  distribution  of  the  raitdm^variable 
go-  Inserting  Oray’t  formula  and  using  ej — m  po  this 

gives 

ElQ(ei*+6)s»))-^  j  xf{x-l-i(m  +  b))dx, 

where  we  have  Introduced  the  auxiliary  functico  /(x) 

^  -f  ro  -  b}  />/5o).  Since  (x  -f  1}  —  (x>  for 

every  x  £  A ,  /(-)  has  the  period  3b.  Using  the  (coinplex) 
Fburier  coefflcienta  corresponding  to  /(-)  we  d^ve  that 
Iftjpbe  a  ixmrational  number  then  rs(n)  ;■>  £(7,.  ■  To] 
can  be  expressed  as 

for  n  >  0.  This  agrees  with  the  studiee  of  the  determinis- 
tic  spectrum  of  the  binary  quantiser  error,  see  (1^,  as  well 
as  with  the  simulated  re^ts. 
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The  multiple  descriptions  problem  is  &  generalization  of  the 
problem  of  source  coding  subject  to  a  fidelity  criterion  [1].  In  its 
simplest  form,  two  channels,  each  with  their  own  rate  constraints, 
connect  the  source  to  the  user.  Either  channel  may  be  broken 
at  any  given  time.  The  objective  is  to  design  a  source  code  that 
minimizes  the  average  distortion  when  both  channels  work,  sub¬ 
ject  to  constraints  on  the  average  distortion  when  only  one  chan¬ 
nel  works.  The  rate  distortion  region  for  a  memoryless  Gaussian 
source  and  squared-error  distortion  measure,  for  the  multiple  de¬ 
scription  problem  has  been  derived  in  (2],  (3).  Surprisingly,  despite 
strong  potential  applications  to  speech  and  video  transmission 
over  p!u:ket  switched  networks  and  to  digital  mobile  telephony, 
the  design  of  such  codes  has  received  little  attention.  Jayant  [4] 
considers  the  design  of  a  system  based  on  subsampling,  for  pack- 
etized  speech. 

We  wish  to  encode  the  output  of  a  stationary,  ergodic  and 
memoryless  source  which  is  represented  by  the  random  process 
{X„,  n  6  S}  with  zero  mean,  variance  <t’  and  known  prob¬ 
ability  density  function  (pdf).  The  entropy-coded  multiple  de¬ 
scription  scalar  quantizer  (ECMDSQ)  is  illustrated  in  Fig.  1.  Let 

1=  {1,2 . =  {l,2,...,Af,},Ij  =  {1,2,...,M,}  and 

assume  that  N  <  M\Mi.  The  source  sample  x  is  mapped  by  ?(•) 
to  the  index  n  that  takes  values  in  7.  The  operation  of  9(-)  can  be 
described  in  terms  of  a  vector  of  thresholds  t  =  (ti,  tj, . . . ,  /at-i), 

<  fj  <  •  •  •  <  fjv-j  by  the  equation  ^(x)  =  n  if  x  €  [<n-i>fn), 
n  =  1,2, ...,7V,  where  [fo, is  the  support  of  the  source  pdf. 
The  index  n  is  mapped  to  indices  i  €  Jj  and  j  6  Jj  by  at{-)  and 
a3(-),  respectively.  The  mapping  (ai,a])  is  called  the  index  as¬ 
signment.  We  associate  with  each  channel  a  variable  length  code 
Cm  =  =  l,2,...,Mm},  m  =  1,2,  where  each  codword  cw, 

is  a  binary  string  of  length  /„■,  t  =  1,2, . . . ,  M„,  m  =  1,2.  Indices 
i  and  j  are  mapped  by  variable  length  encoders  71  and  73  to  code¬ 
words  cu  and  cjj,  and  transmitted  over  Channel  1  and  Channel  2, 
respectively.  If  only  Channel  1  (Channel  2)  works,  the  index  i  {j) 
is  recovered  by  the  variable  length  decoder  and  mapped  by  side 
decoder  gi  (gi)  to  real  number  y\  (yj)  which  takes  values  in  the 
reconstruction  codebook  =  {yn,  i  €  Ii}  (J^j  =  {yjj,  j  €  7j}). 
If  both  channels  work,  central  decoder  go  maps  the  index  prur 
(t,  j)  to  a  real  number  go  which  takes  values  in  the  reconstruction 
codebook  X)  =  {von,  n  €  J}. 

Let  dm(x,yn)  be  the  per-sample  distortion  between  the  source 
sample  and  the  output  of  the  mth  decoder,  m  =  0, 1, 2.  We  refer  to 
do  as  the  central  distortion  and  to  d]  and  dj  u  the  side  distortions. 
The  average  central  and  side  distortions  are  denoted  by  do  and  d, 
and  dj. 

For  given  values  Mi,  Mj,  D\,  Dj,  Ri  and  R2  a  multiple  descrip¬ 
tion  scalar  quantizer  is  said  to  be  optimal  subject  to  entropy  con¬ 
straints,  if  it  minimizes  do  subject  to  di  <  Di,  dj  <  Dt,  H\  <  R\ 
and  Hi  <  fij. 

We  derive  necessary  conditions  for  optimality,  and  present  an 

'This  work  was  supported  in  part  by  NSF  Grant  Number  NCRr91045M. 


iterative  design  algorithm  for  locally  optimal  ECMDSQ’s.  The 
assignment  of  index  pairs  to  the  quantizer  bins — a  crucial  step 
in  the  design — is  also  addressed.  Convergence  of  the  algorithm 
is  proved.  As  a  reference  system,  we  consider  a  multiple  descrip¬ 
tion  scalar  quantizer  (MDSQ)  system  in  which  fixed  length  binary 
codes  are  used  to  transmit  an  index  pair  [5].  We  also  make  com¬ 
parisons  against  the  optimum  performance  theoretically  attain¬ 
able  (OPTA)  [2],  [3j.  Our  results  indicate  that  significant  perfor¬ 
mance  improvements  aie  obtained  over  the  MDSQ.  For  example, 
with  Ri  =  Ri  =  4.0  bits/sample/channel  and  Mi  =  Mi  =  24, 
ECMDSQ  achieves  a  given  side  distortion  at  a  central  distortion 
which  is  4.5  dB  better  than  that  of  the  MDSQ.  Comparisons 
against  the  OPTA  indicate  that  for  a  given  side  distortion  fur¬ 
ther  gains  of  3  dB  for  the  central  distortion  can  be  achieved  over 
ECMDSQ. 
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Figure  1:  Block  diagram  of  the  ECMDSQ  system. 
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ABSTRACT 

A  pyramid  source  code  is  a  code  that  assigns  equal-length  binary 
strings  to  all  reproduction  codewords  of  equal  (weighted)  li  norm, 
and  finds  application  in  the  encoding  of  Laplacian-distributed  data. 
A  pyramid  source  encoding  is  partitioned  into  two  concatenated  map¬ 
pings;  the  first  from  source  word  to  reproduction  codeword  within  a 
codebook;  the  second  from  the  reproduction  codeword  to  a  binary 
string.  The  first  mapping  allows  distortion  and  is  accomplished  using 
trellis  coded  quantization.  The  second  mapping  is  noiseless  and  is  de¬ 
noted  as  enumeration.  Efficient  pyramid  enumeration  encoding  and 
decoding  algorithms  are  presented,  for  use  with  fixed-rate  or  variable- 
rate  pyramid  trellis  codes. 

SUMMARY 

A  pyramid  source  code  is  a  code  that  assigns  equal-length  binary 
strings  to  all  reproduction  codewords  of  equal  (weighted)  t\  norm. 
Such  codes  are  well-suited  for  encoding  Laplacian  data  [l]-[3]  and  find 
application  in  transform  and  sub-band  image  coding  [4]-[7]. 

A  pyramid  source  encoding  can  be  partitioned  into  two  concate¬ 
nated  mappings;  the  first  from  source  word  to  reproduction  codeword 
within  a  codebook,  and  the  second  from  the  reproduction  codeword 
to  a  binary  string.  The  first  mapping  typically  allows  distortion  and 
is  referred  to  herein  as  quantization  or  compression.  The  second  map¬ 
ping  is  lossless,  and  is  referred  to  as  enumeration.  Ennmerative  source 
coding  was  introduced  by  Cover  [8]  for  the  lexicographic  ordering  of 
n-tuples.  The  ordering  developed  in  this  paper  is  different  due  to 
the  pyramid  formulation  and  the  trellis  structure.  The  pyramid  codes 
use  trellis  coded  quantization  [9].  The  contribution  of  the  paper  is 
to  describe  efficient  pyramid  enumeration  encoding  and  decoding  al¬ 
gorithms.  Fixed-to-fixed  length  and  fixed- to- variable  length  pyramid 
trellis  codes  are  easily  constructed  using  the  enumeration  algorithms. 

Trellis  coded  quantization  (TCQ)  [9]  is  an  efficient,  low-complexity 
source  coding  technique  that,  when  used  with  entropy  coding  [10],[n], 
can  provide  near-optimum  rate-distortion  performance  for  a  broad 
class  of  memoryless  sources.  The  uniform  pyramid  trellis  codes  de¬ 
scribed  here  use  a  (possibly  scaled  or  translated)  subset  of  the  integer 
lattice  Z  as  the  codebook,  partitioned  into  2"*'*'*  subsets,  with  integer 
m  >  1.  The  subsets  are  assigned  to  the  2”*  branches  leaving  each 
state  in  an  iV -state  trellis  defined  by  a  rate-Tn/(m  -t- 1)  convolutional 
encoder.  The  entropy-constrainea  TCQ  results  in  [10]  indicate  that 
most  of  the  available  granular  gain  is  achieved  with  m  =  1. 

Consider  the  integer  lattice  Z,  partitioned  into  4  subsets,  Dj, 
j  =  0, 1,2,3,  as  shown  in  Figure  1.  The  lattice  point  z  is  in  Dj  if 
and  only  if  z  mod  4  =  j.  A  time-invariant  labeling  assigns  one  subset 
to  every  branch  leaving  each  trellis  state  in  an  Af-state  trellis.  For 
each  time  step,  i,  a  positive  integer  weight,  say  w;,  is  assigned  to  the 
trellis  transition.  If  y  =  [yi ,  pz , "  - ,  yiV  sequence  of  codewords 
corresponding  to  an  Z-step  trellis  path,  then  the  weighted  ti  norm  of 
the  path  symbols  (the  length-L  path  weight)  is  ^ven  by 

L 

llylli.w  = 

isl 
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For  any  fixed  initial  and  final  trellis  states,  say  s  and  t,  denote  the 
number  of  paths  that  begin  in  state  s,  end  in  state  (,  and  have  weight 
k  as  Af(s,t,  I,  w,Jb),  k  =  0,1,--'.  Computation  of  the  weight  enu¬ 
meration  of  the  code  is  the  basis  for  the  enumeration  encoding  and 
decoding. 

-5  -4  -3  -2  -1  0  1  2  3  4  5 

— • — js - • - • - • - • - • - • - • - a - • - • - •— 

Da  Do  D\  Dj  Do  Do  D\  Dj  Do  Do  D\ 

Fig.  1.  Integer  lattice  codebook  and  partition  into  subsets. 


A  pyramid  trellis  encoding  first  maps  an  input  sequence  into  the 
sequence  of  reproduction  letters  corresponding  to  the  minimum  distor¬ 
tion  path  through  the  trellis.  Fixed-  and  variable-length  enumeration 
codes  are  considered  for  mapping  the  sequence  of  reproduction  letters 
to  binary  strings.  In  the  first,  consecutive  £-tuples  of  the  trellis  re¬ 
production  sequence  ate  mapped  to  consecutive  fixed-length  strings 
of  (int^al)  RL  bits.  The  trellis  search  is  constrained  to  allow  only 
sequences  of  encoded  symbols  of  pre-selected  weights.  That  is,  if  the 
length-£  trellis  path  begins  in  state  s  and  ends  in  state  t,  then  all  code 
sequences  y  must  have  one  of  M  possible  weights,  say  jt;,  i  =  1,  •  •  • ,  Af , 
such  that 

M 

^JV(s,t,£,w,*i)<2"'. 

If  hi  =  »—  li  i  =  1,  •  •  • ,  Af ,  then  the  trellis  codewords  have  been  shaped 
in  sequence  space  so  that  each  X-tuple  lies  within  a  pyramid.  By 
partitioning  the  int^ers  0, •  •  •  ,2”^  -  1  into  M  consecutive  sete,  each 
of  size  JV(s,t,i,w,fcj),  it  is  seen  that  the  essence  of  the  enumeration 
is  simply  to  map  each  of  the  N(s,t,i,  w.h;)  sequences  of  weight  kt  to 
the  integers  0,  •  ■  • ,  A(s,  t,£,  w,ifcj)  -  1- 

In  the  variable-length  encoding,  the  trellis  is  searched  in  the  usual 
way  to  find  the  minimum  distortion  path.  Then,  if  the  length-L  path 
has  weight  k,  the  codeword  y  is  assigned  a  binary  string,  say  6(y),  of 
length  [logj  iV(s,t,£,w,t)]  bits.  A  prefix- free  code  is  used  to  encode 
k  as  c{k),  and  the  binary  string  representing  y  is  (c(lly|li,w),h(y)). 
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Abstract 

In  this  work  the  improvement  in  rate-distortion  performance 
an  entropy  coded  dithered  uniform  (or  lattice)  quantiser,  in¬ 
corporating  appropriate  pre/post  Alters,  is  shown  and  analysed. 
The  proposed  scheme  attains  good  coding  performance,  under 
MSE  criterion,  for  any  source  distribution,  although  its  design 
depends  only  on  the  second  order  statistics  of  the  source. 


, - 1 


r^ur*  ]:  Tk«  rrt/TcHt  Cod*d  Ditlifrcd  Quantb*<»Ofi  S<l«me 

We  en-unine  the  enhancement  in  performance  achieved  by  incw- 
porating  pre/post  linear  Altera  into  the  universal  coding  scheme,  com¬ 
posed  of  a  dithered  quantizer  and  a  lossless  (entropy)  coder.  We  as¬ 
sume  that  the  second  order  statistics  of  the  source  are  known,  and 
a  Mean  Square  Error  (MSE)  criterion  is  used.  Considering  Figure  1, 
the  source  is  denoted  2C  €  72",  its  reconstructed  value  is  the  (cod¬ 
ed)  dithered  quantizer  is  described  by  the  dashed  area  in  the  Agure, 
and  the  matrices  A  and  B  represent  the  pre  and  post  Alters,  respec¬ 
tively.  Qk(')  i*  a  uniform  scalar  or  ik-dimensional  lattice  quantizer, 
characterized  by  Gk  -  the  normalized  second  moment  of  its  lattice. 
The  pseudo-random  dither  2.  >*  distributed  uniformly  over  the  basic 
Voronoi  cell  of  the  lattice  (e.g.,  over  (- A/2,  A/2)  when  a  Kalar  n- 
niform  quantizer  with  a  step  size  A  is  used)  and  is  available  to  both 
encoder  and  decoder.  The  quantizer  output  is  encoded,  conditioned 
on  the  dither  values,  by  a  possibly  universal,  lossless  (entropy)  coder. 
The  coding  rate  of  the  scheme  is  thus  assumed  to  be  the  quantizer 
entropy,  conditioned  on  the  dither,  i.e.,  Rq  =  +  2)\Z) 

(see  [1]). 

As  shown  in  [1],  the  coded  dithered  quantizer  is  equivalent  to  an 
additive  noise  channel  in  rate-distortion  sense.  The  qnantizati<«  noise 
is  independent  of  the  source  and  it  is  distributed  as  -2,  which  implies 
that  the  overall  MSE  distortion  of  the  scheme  is  D  =  ^£||R(i4ic  - 
Z.)  -  2CII^-  The  coding  rate  of  tL"  scheme  is  the  mutual-information 
between  the  input  and  the  output  of  the  equivalent  additive  noise 
channel,  which  in  our  case  can  be  written  at  Rq  =  ^l(AJIi;A2.-2)- 

Intuitively  speaking,  coding  perfmmance  are  enhanced  by  incorpo¬ 
rating  pre/post  Altering,  since  one  may  allow  at  Arst  higher  distortion 
in  co^ng  -  and  thus  save  rate  -  relying  on  nmse  power  reaction  at  the 
reconstruction  by  the  post  Alter.  SpeciAcally,  in  designing  the  quanti¬ 
zation  scheme  we  try  to  simulate  the  structure  of  the  optimal  ‘forward 
channel*,  achieving  the  rate-distortion  Ainction  of  a  Gaussian  source 
(see  e.g.  [2]).  Now,  unlike  the  ‘^rward  channel”  realization,  which 
combines  Altering  and  additive  Gauttian  noise,  the  additive  quanti¬ 
zation  noise  in  the  scheme  is  usually  not  Gaussian.  Neverthdess,  we 
suggest  to  use  the  optimal  Alters  of  the  Gaussian  case,  and  here  we  ex¬ 
amine  the  resulting  performance  for  an  arbitrary  source  and  the  actual 
quantization  noise. 

The  redundancy  of  the  scheme  over  the  rate-distortion  function  of 
the  source  is  deAned  as 

p=R<,(D)-Jl(D)  .  (1) 


1.  For  any  source,  which  is  7>  bits  away  from  Gaussianity, 

(><V  +  ^\og2treGk  (2) 

where  D  =  i  /  /^log  ^  is  the  divergence  between  2l  and  2i*, 
is  the  source  density  and  is  a  Gaussian  density  with 
the  same  mean  and  covariance  as  Note  that  for  a  Gaussian 
source  D  =  0,  and  if  we  farther  assume  that  Gk  ^  for 
lattice  quantizer  with  large  dimension,  then  the  scheme  adiieves 
the  rate-distortion  function  of  the  (Gaussian)  source. 

2.  For  any  source  with  a  density, 

^/>=  |l‘«2TeGt  (3) 

as  in  dithered  quantization  without  pre/poet  Altming. 

3.  For  any  source  with  a  covariance  matrix  R^, 

P<C  |si.=r>  (4) 

where  C  is  the  power  constraint  c^iacity  (at  input  level  5^ 
equals  to  the  allowed  distortion  D)  of  the  equivalent  additive 
channel  X.  =  A2C- iSL>  and  A  is  the  appit^riate  pre-Alter  (for  R, 
and  D).  Note  that  this  bound  implies  low  redundancy  at  high 
distortion,  and  in  particular  it  follows  that  p  -»  0  as  />  goes  to 
the  source  average  power  (since  in  that  case  A  0).  In  general 
C  <  I  log  AteGk  which  is  the  upper  bound  for  the  redundancy 
of  dithered  quantization  without  filters  (see  [1]). 

The  combination  of  (2),  (3)  and  (4)  leads  to  useful  bounds  on  the 
performance  of  pre/post  Altered  dithered  quantization  in  the  general 
case.  In  Agure  2(A),  the  information  rate  curve  versus  MSE,  a  Gaus¬ 
sian  source  encoded  by  a  pre/post  Altered  scalar  dithered  quantizer,  is 
illustrated.  The  graph  is  compared  to  Shannon’s  rate-distortion  func¬ 
tion  (R(D))  and  to  the  performance  of  a  dithered  quantizer  without 
Altering  (B).  Note  that  in  this  case  the  pre/post  Altering  saves  ~  0.75 
bits  at  high  distortion. 


rifort):  blbniiBtMiirBUCwvtAvcrttf  MSE rH •  (UbmImi  Sdwwt: 
lUttfliit,  aM  R(D). 
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We  derive  the  following  bounds  for  this  redundancy: 
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Abstract 

TUi  «rtkle  pwtcnto  •  inrocedw*  £or  opthnifini  Um  «caUr  4iMnti»ert 
bttted  Ml  the  powti  epeetrum  deanty  oC  the  qaentUetiea  nidM.  The 
h^at  ngul  U  Meomed  ftetionery  ta  the  vide  eenee,  bat  no  testrietioa 
k  made  eonccniiiit  its  probaintitj  denntjr  fbactieB. 

Summary 

The  nsnal  optimiMtion  teehiii<taea  for  the  scalar  qnantiict  are  eeatered 
OB  properties  at  the  probabilitj  deasit]'  fooetion  (pdf)  of  the  iapnt  signal  (1] 
[2].  In  fact,  there  seems  to  be  a  tendency  of  the  proposed  schemes  to  obtain 
aa  ontpnt  pdf  that  mote  closdy  resembles  the  uniform  type  [S].  Usnally, 
the  infbtmatioa  on  the  noise  power  it  snffieient  to  approach  a  giren  problem. 
Sometimes,  as  in  the  ease  of  matched  filters  design,  the  shape  of  the  noise 
speetmm  plays  a  mote  important  role. 

The  speetmm  of  the  qaantisation  noise  was  shown  in  a  recent  paper  to 
be  quite  independent  of  the  speetmm  of  the  applied  signal  and  tematkably 
related  to  the  probability  density  fhaetion  of  the  signal  derivatiTe  [4].  A  small 
qnaatiutioa  step,  as  well  at  a  aaifotm  qaantisation  scheme  were  eonsideted 
hi  that  proposed  model.  A  genetalisatioa  of  that  model  is  proposed  in  this 
article,  to  account  for  the  nonuniform  ease. 

Qnantisation  noise  can  be  thought  as  the  result  of  the  appliestioB  of  the 
signal  a(i)  to  a  drenit  with  eharaeteristic  /(a).  The  function  /(s)  is  periodic, 
with  period  d,  at  shown  below 

/(a)  =  a  -  m.d 

(m  -  l)d  <  a  <  (m  l)d,  m  =  0,  ±1,  ±2,  •  ■  ■  (1) 

The  autoeorreiation  fhaetion  of  the  quantisation  noise  can  be  emluated, 
as  described  in  [E),  and  its  power  epeetrum  density  can  be  obtained  by  nsiag 
the  Wiener-Khintehine  theorem  [fi), 


where  pj((.)  is  the  probability  density  funetioa  of  the  derimtire  of  the  input 
signal. 

Equation  2  demonstrates  that  the  power  epectral  density  of  the  quanti- 
satioB  noise  is  related  to  the  probability  density  fanetion  of  the  derimtiira 
of  the  input  signaL  The  conTergenee  of  the  noise  speetmm  to  Eqnatioa  2, 
as  the  stepsise  decreases,  is  a  result  of  a  prenous  work  (7|.  The  noise  spec- 
tmm  refieets  an  infinite  sum  of  contributions,  each  one  with  the  sh^ie  of 
the  probability  density  function,  bat  with  decreasing  intensity  and  increasiBg 
bandwidth. 

For  the  nonanifbrm  ease,  one  can  assume  that  the  signal  is  transformed 
by  a  Bonfinear  function  g{-)  prior  to  the  qnantisation  proeem.  This  gives 


here,  g'(s)  is  the  derivative  of  the  compieision  function. 


Carefnl  sdeetion  of  p(a)  can  minimise  the  followiag  expression  and  max¬ 
imise  the  signal  to  quantisation  noise  mtio.  The  compression  function  must 
be  chosen  in  order  to  displace  the  peak  of  the  quaatisauon  noise  spectrum  far 
outside  the  signal  bandwidth. 

= rji  ^  s  i?  -C 

where  P}/  represents  the  noise  power  that  foUs  inside  the  signal  bandwidrii. 
This  quantity  can  be  made  quite  smalt,  compared  to  the  total  noise 
A  procedure  for  aotving  Eqnatioa  4  involves  the  hnearisation  of  the  fhnetioB 
»(»)• 
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Abstract 

The  design  of  trellis  coded  quantization  (TCQ)  to  minimize  the 
mean-squared  error  (MSE)  has  been  generalized  to  minimize  £{|x  — 
yl"}  for  positive  integer  v.  Simulation  results  for  memoryless  uniform 
and  Gaussian  sources  with  v  =  1  and  4  show  that  TCQ  outperforms 
scalar  quantization  (SQ)  in  the  same  way  as  for  v  =  2. 

Entropy-constrained  trellis  coded  quantization  (ECTCQ)  has  also 
been  studied  for  difference  distortion  measure  p(x,  y)  =  Ix-yl*.  Two 
ECTCQ  realizations  have  been  considered.  One  is  to  design  the  TCQ 
codebook  subject  to  the  output  entropy  of  the  source  encoder.  The 
other  is  to  design  the  TCQ  codebook  independent  of  the  output  en¬ 
tropy  while  the  followed  entropy  code  is  designed  based  on  the  prob¬ 
abilities  of  the  (locally)  optimized  TCQ  codewords  for  the  source 
sequence.  The  latter  is  suboptimal  but  requires  less  computations. 
Simulations  show  that  the  performance  of  the  TCQ  system  is  pner- 
ally  improved  by  combining  with  an  entropy  encoder.  ECTCQ  out¬ 
performs  entropy-constrained  scalar  quantization  (ECSQ)  in  all  cases 
considered.  The  performance  difference  between  the  two  ECTCQ  re¬ 
alizations  at  low  output  entropy  increases  as  v  gets  large,  but  vanishes 
as  the  output  entropy  increases. 

TCQ  system  with  short  coding  delays  and  squared-error  criterion 
has  been  studied.  Simulations  show  that,  with  the  same  amount  of 
coding  delay,  the  performance  of  such  designed  TCQ  is  comparable 
to  that  of  vector  quantizers  (VQ)  for  the  memoryless  uniform  and 
Gaussian  sources,  while  slightly  inferior  to  VQ  for  the  memoryless 
Laplacian  source.  However,  TCQ  requires  much  less  computations 
than  VQ,  expecially  for  large  vector  dimensions  or  encoding  rates. 

Summary 

TrelUs  coded  quantization  (TCQ)  [1]  is  a  low-complexity  form 
of  trellis  coding  [2]  in  which  the  trellis  branches  are  labeled  with 
reproduction  subsets  instead  of  individual  reproduction  levels.  The 
idea  of  designing  TCQ  to  minimize  the  mean-squared  error  (MSE) 
[1]  is  generalized  to  minimize  the  average  distortion  between  the 
input  and  the  output  of  the  quantizer,  given  a  distortion  measure 
p{x,y)  =  \x  -  j/l®  for  positive  integer  v.  Let  the  encoding  rate  be 
R  >  0.  The  TCQ  codebook  contains  N  =  2^*^"  codewords  par¬ 
titioned  into  K  =  2^’+”"  subsets  (according  to  the  rules  in  [1]), 
each  subset  of  L  =  2^"”'  codewords.  0  <  fi'  <  fi  and  0  <  K'- 
The  N.-state  encoding  trellis  is  defined  by  a  taXe-R'/(R'  -^  Rf')  con¬ 
volutional  encoder  with  2^’  branches  entering/leaving  each  state. 
Let  A  =  (xj  :  i  =  1,2,. ..,11  All}  and  y  =  represent  the 

source  (training)  sequence  and  the  TCQ  codebook,  respectively.  Let 
Bi  =  {ij  :  Zj  €  A  is  encoded  as  y,}.  Then  the  TCQ  codebook  y 
should  be  designed  to  satisfy  the  following  conditions 

53  8gn(z;  -  yi){ij  -  Pi)”"'  =  0-  for  odd  positive  integer  »,  (1) 

t,€B, 

or 

53  {xj  -  Pi)“”'  =  0’  for  positive  integer  v,  (2) 

t,eB, 

for  i  =  1,2,  ...,1V. 

The  performance  of  the  TCQ  system  with  distortion  measure 
/K*iP)  =  I®  ~  !/(*'>  o  =  f  ***d  4  is  evaluated  by  simulation,  for  zero 
mean  and  unit  variance  memoryless  uniform  and  Gaussian  sources. 
The  encoding  trellises  used  are  4-,  8-  and  256-state  rate- 1/2  Unger- 
boeck  amplitude  modulation  trellises  [3].  The  simulation  results  show 
that  i)  TCQ  always  outperforms  the  scalar  quantizer  (SQ);  ii)  as 
the  number  of  trellis  states  increases,  the  TCQ  performance  also 
increases;  iii)  for  the  Gaussian  source,  the  gap  between  the  TCQ 
performance  and  the  performance  promised  by  the  Shannon  Lower 
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Bound  [4]  is  significant,  and  increases  as  R  increases;  and  iv)  for  the 
Gaussian  source,  the  improvement  of  TCQ  over  SQ  increases  as  v 
increases.  The  first  three  results  are  the  ssune  as  those  for  o  =  2  in 
[1]. 

The  performance  of  TCQ  for  non-uniformly  distributed  sources 
is  improved  by  combining  entropy  encoding.  Such  a  scheme  is  called 
entropy-constrained  trellis  coded  quantization  (ECTCQ)  [5].  We 
study  ECTCQ  with  distortion  measure  p(x,  y )  =  |i  -  y|”.  Two  kinds 
of  ECTCQ  systems  are  considered.  The  first  realization  is  the  natu¬ 
ral  ECTCQ  (as  for  V  =  2  described  in  [5]  and  [6]).  That  is,  the  TCQ 
and  entropy  encoder  are  jointly  designed  to  minimize  the  functional 
[8] 

J  =  £{p(x,y)}  +  XE{t(y),  (3) 

where  A  is  a  Lagrange  multiplier  and  l(y)  is  the  number  of  bits  used 
by  the  entropy  code  to  represent  y.  The  second  realization  is  based 
on  designing  the  TCQ  alone,  in  the  same  way  as  in  the  fixed-rate 
TCQ  system,  and  then  designing  the  entropy  encoder  based  on  the 
probabilities  of  the  (locally)  optimized  TCQ  codewords.  The  latter 
approach  requires  less  computations  than  the  former,  in  the  sense 
of  either  design  or  implementation.  The  updating  equation  for  the 
TCQ  codebook  of  either  ECTCQ  realization  is  given  by  (1)  or  (2) 
depending  on  the  value  of  v.  The  entropy  encoder  is  realized  as 
the  state-entropy  encoder  [6],  which  assigns  a  single  entropy  coder 
for  each  union  codebook  of  subsets  that  appear  as  labels  for  the 
branches  leaving  each  trellis  state. 

Simulation  results  with  u  =  1,2,  and  4  for  zero  mean  and  unit 
variance  memoryless  Gaussian  and  Laplacian  sources  show  that  the 
performance  improvement  of  the  jointly  designed  system  over  the 
corresponding  independently  designed  system  at  low  output  entropy 
depends  on  the  distortion  criterion.  The  larger  v,  the  larger  the 
improvement.  However,  such  improvement  vanishes  as  the  output 
entropy  of  the  encoder  increases.  Overall,  entropy-constrained  tech¬ 
niques  outperform  the  corresponding  fixed-rate  schemes. 

The  TCQ  with  short  coding  delays  is  studied.  Simulatirm  results 
with  V  =  2  show  that  such  designed  TCQ  has  comparable  perfor¬ 
mance  as  vector  quantization  (VQ)  (designed  with  the  clustering  al¬ 
gorithm  (7))  for  the  memoryless  uniform  and  Gaussian  sources,  but 
is  slightly  inferior  to  VQ  for  the  memoryless  Laplacian  source. 
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We  consider  the  problem  of  asymptotic  quantization  in  conjunction 
with  a  noisy  binary  symmetric  channel.  For  a  noiseless  channel, 
Bennett's  integral  is  a  formula  for  the  distortion  of  a  scalar  quantizer 
given  in  terms  of  the  source  density,  the  number  of  quantization  points 
(assumed  to  be  large),  and  the  distribution  of  quantization  points,  or  point 
density.  In  this  paper  we  extend  Bennett's  integral  to  the  case  where  the 
quantizer  is  used  in  conjunction  with  a  noisy  binary  symmetric  chaimel, 
assuming  that  chatmel  codewords  are  assigned  randomly.  We  also 
derive  an  expression  for  the  optimum  noisy  chaimel  point  density. 

.Summary 

One  of  Shaimon's  fundamental  results  is  that  source  and  channel 
coding  can  be  treated  separately  without  loss  of  performance.  This 
usually  leads  to  the  separ^  design  of  source  and  diatmel  coders  (e.g.  a 
source  code  can  be  designed  assuming  a  noiseless  chaimel).  Shaimon's 
result,  however,  is  one  that  requires  arbitrarily  complex  source  and 
chaimel  coders,  which  is  not  reasonable  in  practice.  The  practical 
channel  code  cannot  guarantee  zero  error  probability  and,  consequently, 
the  performance  of  a  source  code  designed  for  a  noiseless  channel  will 
degrade  when  used  in  conjunction  with  a  noisy  channel.  Thus  one  must 
analyze  and  design  the  source  code  with  the  noisy  channel  in  mind. 

The  purpose  of  this  paper  is  first  to  develop  an  expression  for  die 
distortion  of  a  quantizer  used  in  conjunction  with  a  noisy  binary 
symmetric  channel,  and  then  to  find  the  optimum  distribution  of 
quantization  points,  or  point  density,  when  the  number  of  quantization 
points  is  large  and  the  channel  codewords  are  assigned  ra^omly.  In 
previous  work  [1]  we  derived  lower  bounds  to  distortion  in  terms  of  the 
point  density  using  a  "greedy”  codeword  assignment  This  paper  gives 
more  accurate  estimates  of  the  distortion  caused  by  the  noisy  channel. 
Recently,  Zeger  and  Manzella  [2]  have  derived  a  similar,  but  not  quite 
identical  expression  for  distortion,  without  optimizing  the  distribution  of 
quantization  points. 

A  noisy  channel  quantizer  is  described  by  a  set  of  N  (=  2*-) 
quantization  points  C  =  {yifi-j.  n  partition  S  of  the  te^  line, 

and  a  codeword  assignment  A  =  (Ci}|,|,  where  q  €  (0,1  is  the  L  = 
log2N  bit  codeword  assigned  to  quantization  point  yj.  Given  a  source 
sample  u,  the  encoder  determines  in  which  cell  Sj  the  sample  u  lies  and 
produces  an  L-bit  binary  sequence  q  which  is  transmitted  across  a  binary 
synametric  channel  with  crossover  probability  q<5.  The  channel  output 
is  an  L-bit  binary  sequence  ^ .  The  decoder  receives  Cj  and  outputs  the 
quantization  point  yj.  The  mean  squared  errOT  that  results  can  be  written 

D  =  Ds+  Dc  (1) 


Ds  =  X  J  (u-yi)^p(u)du  (2) 

i-lSi 

N  N 

Dc  »  S  P(UeSi)  X  qL(c^)  (yf-yj)^  0) 

i»l  j«l 

where  qL(cjlCi)  is  the  probability  that  cj  is  received  given  that  is 
transmitted,  p(u)  is  the  probability  density  of  the  source,  and  = 
EIUIUeS)]  is  the  probabilistic  centroid  of  S^.  The  first  term  in  (1) 
(source  dhnottion  0$)  is  the  distortion  thjt  results  assuming  a  noiseless 
channel  and  codebook  consisting  of  the  >j 's.  The  second  term  (channel 
distortion  Dc)  is  the  distortion  due  to  chatmel  errors. 

Bennett's  integral  is  an  asymptotic  formula  (i.e.  for  large  N)  for  Ds 
that  depends  on  the  distribution  of  quantization  points,  or  point  density 
)l(u),  the  source  density  p(u)  and  the  number  of  points  N.  For  a  scalar 
quantizer  with  N  large, 

This  expression  can  be  used  to  find  the  optimum  (in  minimum  mean 
squared  enor)  pdnt  density  for  a  noiseless  channel,  which  is  found  to  be 


X(u)  =  cp’'^(u)  (5) 

where  c  is  a  constant  independent  of  N  such  that  |  X(u)  du  =  1. 

The  principal  result  of  this  paper  is  foe  following  expression  fw  the 
channel  distortion  D^  in  terms  of  the  point  density  X(u),  the  number  of 
points  N,  and  the  channel  crossover  probability  q,  when  the  codeword 
assignment  is  random: 

Dc  =  (l-d-q)*-)  (  J  u^  X(u)  du  +  -  2  Dj  )  (6) 

where  is  the  variance  of  foe  source  and  D^  is  foe  source  distottioiL 
For  a  given  source  density  p(u),  size  N,  and  channel  crossover 
probability  q,  foe  total  source  plus  channel  distortion  is 

D=  4/r;^  P(“)d“  +  (Hl-q)4(/  u^  X(u)du  +  o2 -2Ds) 
N  mu) 

One  may  minimize  this  expression  with  respect  to  the  point  density  X(u) 
subject  to  foe  constraint  fok  X(u)  integrates  to  1.  This  is  an  isopeiimetric 
problem  of  foe  calculus  of  variations  which  yields  foe  noisy  chatmel 
point  density 


X*(u)  = - P  ,  CD 

((1-(J)  u^  +  p)*" 

where  c  =  (4<J-1)/12N^,  Q  =  (1-q)^  and  p  is  a  constant  such  that  X(u) 
integrates  to  1. 

To  see  that  foe  optimum  point  density  in  (7)  performs  b^ter  than  foe 
point  density  in  (5),  consider  a  uniform  source  on  [-.5,.5].  From  (7),  foe 
optimum  noisy  channel  point  density  is 

X»(u) « - \ - -  c'^  (8) 

“  ((l-(3)u^  +  p)'^ 

where  c,  Q  and  p  were  defined  previously.  Figure  1  compares  the  signal- 
to-quantization  noise  performance  as  a  function  of  q  for  an  N=32 
"Channel-optimized"  quantizer  with  foe  optimal  point  density  in  (8)  and 
a  "Non-channel  optimized"  uniform  scalar  quantizer  (the  optimum 
noiseless  chatmel  scalar  quantizer)  as  suggested  in  [2],  The'  channel 
optimized  point  density  performs  about  3dB  better  for  sufficiently  large 
channel  error  probabilities. 
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Figure  I;  Signal-to-noise  r«Jo  for  scalar  quantlzm  with  uniform  and 
chaiuiel-optimlzed  point  density,  in  terms  of  crossover  pnhabil^  q. 
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ABSTRACT 

In  this  paper,  the  combinatorial  optimization  algorithm  known  as  simulated 
annealing  is  used  for  the  optimization  of  the  trellis  structure  or  the  next-state 
map  of  the  decoder  finite-state  machine  in  trellis  waveform  coding.  The  gen¬ 
eralized  Lloyd  algorithm  which  finds  the  optimum  codebook  is  incorporated 
into  simulated  annealing.  Comparison  of  simulation  results  with  previous 
work  in  the  literature  shows  that  this  combined  method  yields  coding  sys¬ 
tems  with  good  performance. 

1  Introduction 

A  high-performance  waveform  coding  technique  is  known  as  trellis,  looka¬ 
head,  or  delayed  decision  source  or  waveform  coding  [1].  Trellis  waveform 
coding  uses  a  finite-state  machine  as  the  decoder.  This  machine  is  de¬ 
fined  by  an  output  map,  corresponding  to  the  codebook,  and  a  next-state 
map,  corresponding  to  the  trellis  structure,  both  of  which  being  functions  of 
the  channel  symbol  and  the  current  state.  The  extension  of  the  next-state 
map  or  the  state  transition  diagram  in  time  is  known  as  a  trellis  structure, 
a  weighted  directed  graph  consisting  of  identical  stages.  Each  stage  corre¬ 
sponds  to  a  time  instant.  The  encoder  is  matched  to  the  decoder,  it  examines 
the  trellis  and  finds  the  channel  sequence  that  leads  to  minimum  distortion, 
which  is  the  sum  of  the  distortion  values  between  the  input  and  reproduction 
symbols.  This  can  be  accomplished  by  a  trellis  search  algorithm,  such  as 
the  Viterbi  Algorithm  (VA).  The  encoder  in  a  trellis  waveform  coding  system 
is  simply  a  trellis  search  algorithm  matched  to  the  decoder  finite-state  ma¬ 
chine.  Therefore,  the  design  problem  reduces  to  the  design  of  the  decoder 
finite-state  machine.  This  problem  has  been  addressed  by  several  authors  in 
the  literature,  see,  e.g.,  (1 ),  (2).  The  design  of  the  finite-state  machine  for  a 
quantizer,  using  a  trellis  search,  or  in  the  context  of  finite-state  vector  quan¬ 
tization,  without  any  search,  has  also  been  addressed  in  the  literature,  see, 
e.g.,  [31.  In  this  work,  we  optimize  both  the  codewords  and  the  finite-state 
machine  structure  of  a  scalar  trellis  waveform  coder  that  uses  the  Vitertsi 
algorithm,  using  a  near-optimum  approach.  For  the  optimization  of  the  de¬ 
coder  finite-state  machine,  we  make  the  observation  that  since  the  decoder 
is  equivalent  to  the  trellis  structure,  for  a  given  set  of  codewords,  and  a  given 
input  sequence,  it  is  clear  that  finding  the  optimum  decoder  is  equivalent  to 
finding  the  trellis  structure  that  will  generate  a  channel  sequence  with  min¬ 
imum  distortion  at  the  decoder  output.  This  is  a  combinatorial  optimization 
problem  and  can  be  solved  by  known  optimization  methods.  In  this  paper, 
we  propose  the  simulated  annealing  algorithm  [4]  for  this  purpose. 

2  The  Deeign  Method 

In  this  work,  the  state  space  is  chosen  to  be  all  the  possible  state  transitions 
in  a  single  stage  of  the  trellis.  We  are  interested  in  trellis  waveform  coders 
with  rate  1  bit/sample.  This  imposes  a  constraint  on  the  encoder  structure: 
from  each  node,  there  are  two  outgoing  branches  which  correspond  to  values 
of  0  and  1  for  the  binary  channel  code.  We  also  constrain  the  number  of 
input  branches  going  into  each  node:  there  are  two  Incoming  branches. 
This  constraint  Is  imposed  in  order  to  obtain  a  more  symmetric  structure  so 
that  the  search  space  is  minimized  and  the  possibility  of  pathological  trellis 
structures  is  certainly  eliminated.  The  move  set  has  been  chosen  to  be  just 
the  fNpping  of  two  branches,  so  that  the  output  of  a  rrtove  is  again  In  the  state 
space.  The  cost  function  Is  simply  the  minimum  metric  calculated  by  VA.  The 
initial  value  of  the  control  parameter  is  calculated  as  suggested  by  Johnson 
et  at.  [5J.  Geometric  improvement  is  used  as  the  cooling  schedule.  The 
length  of  Metropolis  loops  are  determined  experimentally.  ^  the  source,  a 
first  order  Gauss-Markov  source  with  autocorrelation  coefficient  0.9  Is  used. 
This  source  is  chosen  since  it  is  a  common  model  of  real  data  and  it  is  widely 
used  In  comparing  data  compression  systems.  For  the  design  of  codewords, 
we  used  the  generalized  Lloyd  algorithm  (GLA)  [1|. 

In  this  work,  GLA  and  SA  are  run  together.  For  a  given  codebook,  the 
trellis  structure  is  optimized  using  SA,  and  for  this  structure,  the  codebook  is 
modified  using  GLA.  The  process  is  stopped  when  the  system  reaches  an 
equilibrium,  with  respect  to  the  SA  criteria. 
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3  Results 

Trellis  waveform  coding  systems  of  different  constraint  lengths  were  trained 
using  a  first  order  Gauss-Markov  source,  and  were  coded  using  SA  and 
GLA.  For  constraint  lengths  of  K  =  2-8,  signal-to-quantization-noise  ratios 
(SQNR)  were  computed.  Then  the  system  was  tested  using  another  first 
order  Gauss-Markov  source.  In  Table  1 ,  the  computed  SQNR  [dB]  values 
are  given  (SA+GLA)  together  with  the  results  of  Stewart  et  al.  (GLA)  [1  ], 
and  of  Ayanoglu  and  Gray  (PS)  [6].  Results  obtained  using  SA  are  better 
than  those  of  [i].  This  is  expected  since  in  [1]  the  trellis  structure  was  fixed, 
not  optimized.  The  results  obtained  via  the  predictive  system  [6]  are  better 
than  the  combined  system,  especially  for  low  constraint  lengths.  Again,  this 
is  expected  since  the  predictive  system  has  a  higher  system  complexity. 
However,  our  results  are  sufficiently  dose  to  those  of  [6)  fOr  intermediate 
constraint  lengths,  so  that  the  nonpredictive  system  once  again  becomes 
attractive.  Alternatively,  SA  can  be  incorporated  into  the  predictive  system 
design  with  possibly  better  performance. 

References 

(1)  L.  C.  Stewart,  R.  M.  Gray,  Y.  Linde,  "The  Design  of  Trellis  Waveform 
Coders,"  IEEE  Trans.  Comm.,  Vol.  COM-30,  pp.  702-711,  April  1982. 

[2]  G.  H.  Freeman,  J.  W.  Mark,  I.  F.  Blake,  "Trellis  Source  Codes  Designed 
by  Conjugate  Gradient  Optimization,"  IEEE  Trans.  Comm.,  Vol.  COM- 
36,  pp.  1-12,  January  1988. 

|3]  J.  Foster,  R.  M.  Gray,  M.  0.  Dunham,  "Finite-State  Vector  (Quantization 
for  Waveform  Coding,”  IEEE  Trans.  Info.  Theo.,  Vol.  fT-31 ,  pp.  348-359, 
May  1985. 

[4]  S.  Kirkpatrick,  C.  D.  Gelatt,  M.  P.  Vecchi,  "Optimization  by  Simulated 
Annealing,"  Science,  Vol.  220,  pp.  671-680,  May  1983. 

[5]  D.  S.  Johnson,  C.  R.  Aragon,  L.  A.  McGeoch,  C.  Shevon,  ‘Optimization 
by  Simulated  Annealing:  an  Experimental  Evaluation,  Parts  I  and  II.' 
Operations  Research,  Vol.  37,  pp.  865-892.  December  1989.  and  Vol. 
39,  pp.  378-406,  June  1991. 

[61  E.  Ayanoglu.  R.  M.  Gray,  "The  Design  of  Prediclivo  TreHls  Waveform 
Coders  Using  the  Generalized  Lloyd  Algorithm,”  IEEE  Trans.  Comm., 
Vol.  COM-34,  pp,  1073-1081,  November  1986. 


SA+GLA 

- 5tS - 

- PS 

K 

train 

test 

train 

test 

train 

test 

2 

6.92 

6.86 

6.92 

6.86 

11.08 

10.73 

3 

9.81 

9.45 

8.77 

8.59 

11.53 

11.18 

4 

11.24 

11.13 

10.13 

9.87 

11.84 

11.47 

5 

11.90 

11.77 

11.05 

10.67 

12.18 

11  .83 

6 

12.00 

11.90 

11.56 

11.09 

12.38 

11.96 

7 

12.29 

11.98 

11.8. 

11.70 

12.52 

12.52 

8 

12.32 

11.97 

12.13 

11.91 

12.64 

12.58 

Table  1 :  SCQNR  [dB]  values,  SA-i-GLA:  Simulated  Annealing  and  General¬ 
ized  Lloyd  Algorithm,  GLA:  Generalized  Lloyd  Algorithm  only,  PS:  Predictive 
System.  K:  Constraint  Length. 


443 


Conference  Author  Index 


A 

Aazhang,  B.  43.  206,  379 

Abdel-Ghaffar.  K.  A.  S.  123 

Abrahams,  J.  216 

Adachi.F.  345 

Agrell,  E.  394 

Ablswede,  R.  396 

Alabbadi,  M.  199 

Alajaji.  F.  426 

Al-Bassam,  S.  9 

Albuquerque,  A.  A.  198 

Alencar,  F.  M.  R.  227 

Alencar,  M.  S.  190,  440 

Ali,l;  325 

Alon,  N.  433 

Al-Rumaih,  R.  M.  257 

Amit.Y.  185 

Amrani,  O.  61 

Anderson,  J.  268 

Anderson,  J.  B.  19,  203,  271.  417 

Ansari,  A.  424 

Antweiler.  M.  411 

Araki,  K.  34 

Ariel,  M.  25 

Arikan,  E.  152 

Arimoto,  S.  218,395 

Ashley,  J.  J.  4 

Allas,  L.  E.  173 

Aulin.T.  272 

AxieU.M.  L.  350 

Ayanoglu,  E.  219,317,443 

AygdW,  U.  290 

B 

Balamesh,  A.  S.  435 

Balram,  N.  281 

Baram,  Y.  430 

Baras,  J.  S.  340 

Barg,  A.  129 

Barron,  A.  R.  31,54 

Baum,  C.  W.  109 

Be’ery.Y.  29,61 

Bdgin,  G.  386 

Belflore,  J.  C.  342 

Bell.  M.  R.  428 

Belongie,  M.  238 

Belzile,  J.  108,  267 

Bender,  P.  E.  6 

Benedetto,  S.  413 

Benyamin-Seeyar,  A.  99 

Berger,  T.  151,223 

Betz,J.  W.  16,18 

Bhargava,  V.  K.  97,301 

Biglieri,  E.  287,  3M 

Biniashvili,  A.  239 

Bitzer,  D.  L.  270 

Blake,  I.  F.  78.96 

Blakley.B.  229 

Blakley,  G.  R.  229 

BUum,  M.  125, 126.  294, 295 

Bloemen,  A.  H.  A,  300 

Blostein,  S.  D.  89 

Blum,  R.  S.  12 

Bobrowski,  R.  163 

Boncelet,  C.  G. ,  Jr.  118 

Bose,  B.  7,  9 

BonUd.K.  342 

Boors,  P.  A.  H.  127 

Bovik.A.C.  427 

Boast,  S.  410 

Brady,  D.  48,50 

Bnindt-Pearce,  M.  379 

Breen,  M.  A.  350 

Bross,  S.  78 

Brown,  C.  P.  405 

Broaldi.R.  A.  366 

Brock,;.  126,294,295,433 


Bumashev,  M.  V.  15 
Burr,  A.  G.  67.284 

C 

Calderbank,  A.  R.  137.  141,  154,  183, 
251 

Cambanis,  S.  315.  328 

Camion,  P.  146 

Campbell,  L.  L.  310 

Campello  de  Souza,  R.  M.  227 

Capocelli,  R.  M.  7 

Cartel,  C.  305 

Castelli,  V.  355 

Cenkl.M,  405 

Cercas.  F.  A.  B.  198 

Chabanne,  H.  398 

Chan,  A.  H.  229 

Chan.F.  414 

Chan.W.  K.  211 

Chan.W.-Y.  335 

Chandran,  S.  R.  318 

Chang,  C.  S.  161 

Chang,  C.-S.  215 

<^ang,  S.  C.  79 

Chao.C.-C.  346 

ChamKeitKong,  P.  138 

Chayat,  N.  259 

Chen.B.  92 

Chen.C.-C.  27 

Chen.J.  282 

Chen.P.-N.  II 

Chen.X.  302 

Chen.Z.  223 

Cheng.  R.S.  209.260 

Cherobini,  G.  241,253 

Cheung,  K.-M.  381 

Chiu,  M.-C.  346 

Chou,  P.  A.  53 

Cimadevilla,  M.  O.  94 

Cioffi.J.  M.  47 

Clark.  J.J.  331 

Clarke,  B.  54 

Claike.W.  A.  299 

Cochran,  D.  330,331 

Coffey,  J.T.  158.303 

Cohen,  G.  D.  150,  370 

Cohen.  J.  E.  1 

Cohn,  D.  176 

Collins.  O.  20 

Conte.  E.  87 

Cooper.  A.  B..  ifl  82 

Costello.  D.  J. ,  3r.  139.  144,  415 

Courteau,  B.  146 

Cover,?.  M.  311,355 

Csibi,  S.  319 

Csiszir.I.  73 

D 

Dabak.A.  207 
da  Costa  e  Silva,  M.  A.  O.  63 
Dale,  M.  376 
Dallal.Y.E.  168 
da  Rocha,  V.  C. ,  Jr.  80 
Dettmar,  U.  382 
Dholakia.A.  270 
Di  Bisceglie,  M.  87,  90 
Di  Porto.  A.  37 
Divsalar,  D.  381 
Dixit,  C.  224 
Dodunekov,  S.  M.  306 
Dolinar,  S.  381 
Domaszewicz,  J.  437 
Dorsch.B.  O.  399 
Drakul.S.L.  287 
Drane.C.  R.  157 
Dumer,  1. 1.  31 


E 

Effros.M.  53 
Ehihatd.  D.  250 
Elia.M.  360 
Ephremides.  A.  324 
Ericson,  T.  296 
Elzion,  T.  197 

F 

Fang.G.  39 
Fang.Y.  389 
Faragd,  A.  431 
Farrell,  P.  G.  32 
Farvardin,  N.  169,  392,  425 
Feder,  M.  52,  72,  74.  420.  439 
Feng.G.  L.  95.304 
Ferland,  G.  383 
Ferreira.  H.  C.  162,299 
Fessler.  J.  A.  131 
Filip.  P.  391 
Fine.?.  L.  351,432 
Finesso,  L.  1 86 
Fischer.  T.R.  438,441 
Fishbum,  P.  C.  137,  141 
Fitz,  M.  P.  164 
Fitzpatrick,  P.  98 
Fonseka.  J.  P.  349 
Forest,  S.  108 
Fomey,  G.  D.,  Jr.  177 
Franaszek,  P.  A.  3 
Francos,  J.  M.  93 
Frankl.  P.  154 
Freeman,  G.  H.  119 
Fuja.T.  102.122,426 
Fujiwara.  E.  40,  244 
Fujiwara,  H.  26 
Fujiwara,  T.  68 
Fukumasa,  H.  404 

G 

Gabidulin.  E.  M.  412 
Gagliardi,  R.  376 
Gagnon,  F.  344 
Games,  R.  A.  405 
Gao.y.  382 
GareUo.  R.  413 
Cehrman,  C.  230 
German.  D.  133 
Gersho.  A.  170,  335 
Gibson,  J.  D.  94 
Giraud.X.  342 
Gorman,  J.  D.  136 
Gozzo,  F.  203 
Graham,  R.  L.  154 
Greenberg,  G.  197 
Crenander,  U.  185 
Gu,J.  122 
Cubner,  J.  A.  309 
Cuida,  F.  37 
Gulliver,  T.  A.  301 
COniher,  C.  G.  70 
Guo.N.  338 
Gyarfi,  L.  51,55,321 

H 

Haccoun.  D.  108,267,414 
Hagen,  R.  171 
Hajek.B.  320 
Halpenny,  L.  225 
Hamada,  M.  244 
Hammons,  R.  196 
Han.T.S.  71.153 
Han.Y.S.  27 
Haidin,  R.  H.  60 
Harris,  C.F.  279 
Hartmann,  C.  R.  P.  27 


HasM,  M.  A.  97 

Hashimoto,  T.  101,385 

Hatsan.A.  A.  111.377 

Hassner,  M.  358 

Hala,  M.  297 

Hauge,  E.  R.  361 

Hauisler,  D.  54 

Hedelin,  P.  171 

Heegard,  C.  238,283,341,401 

Hekitra,  A.  P.  21 

Heiberg,  A.  S.  J.  299 

Helleseth.T.  361 

Henkel,  W.  285 

Hero.  A.  O.  131.187,191 

Hershey,  J.  E.  377 

Herzberg,  H.  62 

Higgle,  G.  R.  336 

Hirasawa,  S.  397 

Hirschfeld.  J.  W.  P.  246 

Hole.K.  J.  242 

Holubowicz,  W.  163 

Honary,  B.  23 

Honig,  M.  L.  49,  372 

Honkala,  I.  S.  39,  197 

Horowitz,  J.  133 

Hou,  X.-d.  193 

How.  S.  K.  416 

Hu,  I.  316 

Huang,  S.-C.  276 

Huang,  Y.-F.  276 

Huber.  K.  359 

Hughes,  B.  82,  210,  323 

Hussain.  Y.  392 

I 

Imai,  H.  64,138,404 
Immink,  K.  A.  S.  2 
Irie,  H.  347 
Itoh,  S.  282 

J 

Ji.C.  434 

Jia.  M.  99 

Jiang,  S.  40 

Jinushi,  H.  36,  292 

Johannesson,  R.  268.  288 

Johansson,  T.  231 

Johnson,  D.  H.  17,  207 

K 

Kabatianskii,  G.  A.  255 
Kadota.T.  T.  13 
Kailath.T.  188 
Kaleh.G.  K.  201 
Kallel,  S.  100.  273 
Kamabe,  H.  8 
Kamiya,  N.  307 
Kantorovitz,  M.  R.  393 
Kao.Y.-H.  340 
Kaplan,  G.  105,  266 
Kasami,  T.  68 
Kastam,  S.  A.  12 
Kawabata,  T.  112 
Kennedy,  G.  T.  365 
Kerpez,  K.  J.  261 
Ketseoglou,  T.  J.  322 
Khachatrian,  L.  H.  295 
Khayrallah,  A.  S.  308 
Kieffer.J.  C.  277 
Kiely,  A.  B.  158 
Kim.  D.  205 
Kl0ve.  T.  147 
Knagenhjelm,  P.  339 
Kobayashi,  H.  220 
Kobayashi,  K.  113,  406 
Koch,  M.  285 
Kofman,  Y.  160 
Koga.H.  395 
Kohno.R.  404 
Kolodziejski,  K.  R.  16 
Kofflo,  J.  J.  362 
Koplowiiz,  J.  278 


Koshelev,  V.  N.  212 
Koski.T.  436 
Kot,  A.  D.  291 
K6ller,R.  33 
Krauss,  T.  P.  88 
Krouk,  E.  A.  255 
Kizyzak.  A.  353 
Kschischang,  R  R.  195 
Kubo,  J.  116 
Kulkami.  S.  217 
Kumar,  P.  V.  196.  298 
Kurtas.  E.  208 
Kuruoglu,  E.  E.  443 
Kuznetsov,  A.  V.  128 

L 

Lachaud,  G.  247 

Ladner.  R.  173,  176 

Ui.C.  H.  100 

Lam.W.-M.  217 

Lafidoth,  A.  143,263,266 

Laroia,  R.  169 

Larsson,  P.  38 

Lazic,  D.  2M 

Lee.C.-C.  169 

Lee.  J.  S.  no 

Leland,  R.  P.  314 

Le-Ngoc,  T.  99 

L60,  A.  M.P.  227 

Letaief.K.  B.  274 

Leung,  C.  291 

Levenshtein,  V.  245, 296 

Levitin,  L.  B.  76 

Levy.Y.  139 

Li.  K.  273 

Li,  W.-C.  W.  154 

Li.Y.-X.  236 

Liesenfeld,  B.  399 

Likhanov,  N.  B.  320 

Lin.  M.-C.  419 

Lin.  S.  68.  184,  289,  318,  343 

Lindell  O.  418 

Linder.  T.  65.390 

Linne  von  Berg,  D.  C.  380 

Lipman,  M.  J.  216 

Litsyn,  S.  370 

Liu.C.-C.  186 

Liu.Y.  89 

Livingston,  J.  N.  286 
Loeliger,  H.-A.  180.  182 
Longo,  M.  10, 90 
Lops,  M.  10,87 
Lorenzelli,  F  332 
Lu.C.-C.  200 
Lugosi,  G.  356,431 
Lunn.  T.  J.  67 
Lyons,  D.  F  333 

M 

Ma,  S.-C.  419 
Madhow,  U.  49,  372 
Madhusudhana,  H.  S.  400 
Mandayam,  N.  B.  43 
Mandell,  M.  1.  252 
Mao,  R.  349 
Maragos,  P.  427 
Marcellin,  M.  W.  5 
Marcus,  B.  H.  4 
Markarian,  G.  23 
Markman.  I.  271 
Marton,  K.  189 
Masiy,  E.  315 
Massey.  J.  L.  229.  373 
Massey,  P.  181 
Mathys.  P.  181.232.257 
Matsumoto,  T.  345 
Matsushima,  T.  K.  348 
Mattson,  H.  F  ,  Jr.  371 
McEliece,  R.  J.  142,  252 
McKellips,  A.  L.  310 
McLane,  P.  J.  161 
McLaughlin,  S.  W.  442 


Meeuwissen,  H.  B.  300 

Melas.  C.  M.  126 

Merhav,  N.  72,  265,  266,  352.  420 

Middleton,  D.  86 

Miller,  D.  172 

Miller.  L.E.  110 

Miller,  M.  I.  134,  185 

Mills,  D.  G.  144 

Milstein,  L.  B.  202 

Mittelholzer,  T.  182 

Mittenthal,  L.  233 

Miura,  S.  307 

Modestino,  J.  W.  107,  279,  422 
Modiano,  E.  324 
Mondin,  M.  413 
Montolivo,  E.  37 
Montorsi,  G.  413 
Montpetit,  A.  146 
Morelos-Zaragoza.  R.  H.  184 
Moreno,  C.  J.  145 
Moreno,  O.  145,  298,  358,  405 
Mori.  S.  116.167 
Morii,  M.  34 
Morita,  H.  113,406 
Moulin,  P.  135 
Moura,  J.  M.  F  281 
Muhammad,  K.  274 
Muller,  F  91.  280 
Murad,  A.  H.  102 

N 

Nagata,  O.  397 
Nakagawa,  K.  222 
Narasimhan,  A.  93 
Narayan,  P.  186 
Nasiri-Kenari.  M.  240 
Nelson.  G.  277 
Nelson,  L.  B.  44 
Neuhoff,  D.  L.  333.  435,  442 
Nill.C.  22 
Nilsson,  J.  E.  M.  306 
Nilsson,  M.  364 
Nishijima,  T.  397 
Nobel,  A.  B.  334 
Noneaker,  D.  L.  106 
Norton,  G.  H.  398 
Noviskey,  M.  J.  350 

O 

Oda.  H.  327 
Offer,  E.  19 
Oguz,  N.  C.  317 
Ogiwara,  H.  347 
Ohtsuki,  T.  167 
Oka.  I.  26 
Olser,  S.  241,243 
Olshen.  R.  A.  226,  334 
Onyszchuk,  1.  142.  381 
Orcutt.  E.  K.  5 
O’ReiUy,  J.  J.  124 
Orlitsky.A.  155.213 
Orsak,  G.  C.  14.  206 
Osthoff.  H.  268.  269 

P 

Palazzo,  R.,  Jr.  63 
P41i.  1.  55 
Pan.J.  438 
Panayirci.  E.  290,  357 
Papamatcou,  A.  11 
Paris,  B.-P.  14 
Parks,  T.  W.  88 
Parsavand,  D.  374 
Paterson,  K.  G.  408 
Pawlak,  M.  356 
Pei,  P.  405 
Peng.X.-H.  32 
Penzhom,  W.  T.  103 
Pereira,  J.  M.  N.  166 
Perry,  P.  123 
Persson,  J.  288 
Phamdo,  N.  425 


Ple»i.V.  365,366 
Polemi,  D.  358 
PoUara,  F.  381 
Poltyrev,  G.  62,  84,  159 
Poor,H.  V.  15,44,423 
Popken,  L.  165 
Popplewell,  A.  124 
Poscetti,  G.  M.  37 
PoUie,G.J.  251 
Proakis,  J.  G.  16 
Psaltis,  D.  354,434 
Pursley,  M.  B.  106,  109 

Q 

Quatieri,  T.  F.  427 

R 

Rabinovich,  A.  141 
Raghavan,  S.  A.  28,  202 
Rajan,  B.  S.  400 
Rajpal,  S.  289 
Rao,P.S.  17 
Rao,T.R.  N.  304 
Rapajic,  P.  B.  45 
Rasmussen,  L.  K.  384 
Redinbo,  R.  387 
Reed,  I.  S.  302 
Reid,  W.J.,  III  362 
Ren,Q.  220 
Rhee,  D.  J.  289 
Rimoldi,  B.  81,  85 
Riskin,  E.  A.  173,  176 
Rissanen,  I.  52 
Rua,N.  A.  377 
Roche,  J.  R.  214 
Rose,  K.  75,  172 
Ross,T.  D.  350 
Rossin,  E.  J.  283 
Roih,  R.  M.  4,35 
Ruf,  M.  J.  391 
Rupf,  M.  373 
Ruprecht,  1.  375 
Rushanan,  J.  J.  405 
Rushforth,  C.  K.  240 
Ruszinkd,  M.  367 
Ryabko,B.  Y.  57 

S 

Sadowsky,  J.  S.  313 
Safavi-Naini,  R.  234 
Said,  A.  417 
Saints,  K.  401 
Sakaniwa,  K.  36,  292 
Sakata,  S.  363 
Salehi,  M.  208 
Samarasooriya,  V.  N.  S.  421 
Sasaki,  G.  224 
Sasase,  I.  116,167 
Sato,Y.  327 
Schaewe,  T.  J.  134 
Schalkwijk,  J.  P.  M.  300 
Schapiro,  B.  76 
Schlegel,C.  24,65 
Schneider,  W.  R.  70 
Schulz,  T.J.  132 
Sendrier,  N.  402 
$enk,V.  264 
Seshadri,  N.  183 
Shah,  A.  A.  326 

Shamai,  S.  105,  160, 168,  259,  262, 
266 

Shamoon,  T.  341 
Shen,  S.  429 
Shenoy,  R.  G.  88 
Shepp,  L.  130,  154 
Sheppard, }.  A.  284 
Shi,f.-J.  83 


Shibuya,  T.  36 
Shields,  P.C.  115.189 
Shtarkov,  Y.  M.  56,  59 
Shwedyk.E.  275 
Siddiqi,  M.  U.  258,  400,  407 
Siegel.  P.H.  35 
Simonis,  J.  148 
Sloane,  N.  J.  A.  60 
Smeets,  B.  156 
Smyth,  C.J.  225 
Snapp,  R.  R.  354 
Snyder,  0.  L.  120 
Snyders,  J.  25,  84 
Soli,  P.  174.368 
Solomon,  G.  192 
Sorger.U.  K.  30.382 
Stark.  W.E.  111.253 
Steinberg.  Y.  423 
Steiner,  M.  204, 403 
Stevenson,  T.  J.  256 
StiUer.C.  91,280 
Stokes,  P.  368 
Struik,  R.  369 
Stutzer,  M.  !■  312 
Su.Y.  328 
Sun.  F.-W.  66 
Sundberg,  C.-E.  W.  22 
Sung,  W.  303 
Suzuki,  H.  218 
Suzuki.  J.  58 
Swanson,  L.  381 
Swarts,  F.  162 
Swaszek,  P.  F.  175 

T 

Takada,  M.  34 
Takata,  T.  68 
Takumi,  1.  297 
TaUini.L.  7 
Tanaka,  H.  348 
Tassiulas,  L.  221 
Tempel.D.  J.  275 
Thelen,  B.  J.  136 
Thomas,!.  A.  3,215 
Tjalkens.T.  59,121 
Tombak.  L.  234 
Tomlinson,  M.  198 
Tong,  L.  188 
Trott.M.  D.  177,178 
Truong,  T.  K.  302 
Tsfasman,  M.  A.  249 
Tsybakov,  B.  S.  320 
Tu,C.  293.409 
Tufts.  D.W.  326 
Turmon,  M.  J.  432 
Tzeng.  K.  K.  95 

U 

Udaya,  P.  258.407 
Ud6n.J.  418 
Ungerboeck,  G.  243 
Urbanke,  R.  85 
Uyematsu,  T.  41 

V 

Vaishampayan.  V,  104,  437 
Vajda,  I.  321 
Valdez.  C.  26 

van  der  Meulen,  E.  C.  51,55 
van  der  Vleuten.  R.  J.  388 
van  Tilborg,  H.  C.  A.  66,  126 
van  Tning,  T.  228 
Varanasi,  M.  K.  42,  46,  374 
Vardy.A.  29.61.370 
Varshney,  P.  K.  421 
Vastola.  K.  S.  325 
Vasudevan.  S.  46 


Venkatesh,  S.  S.  316.354 
VerdiS.S.  71,153.209.262 
Veugen,  T.  235 

Vinck.  A.  J.  H.  128,  162,  245.  299 

Viswanathan,  R.  424 

Viterbi,  A.  J.  254 

Viterbi.A.M.  254 

Vladut.S.  G.  248 

Vouk.M.  A.  270 

Vucetic,  B.  S.  45 

w 

Wan.  Z.-x.  179 
Wang.F.-Q.  415 
Wang.  M.  441 
Wang.R.-Y.  173 
Wang.Y.-Y.  200 
Watanabe,  Y.  83 
Weber,  J.H.  125,388 
Wei.C.  330 
Weinberger.  M.  J.  52 
Wicker.  S.  B.  199,  384 
Wigderson,  A.  155 
Willems,  F.  M.  J.  59 
Williamson,  C.  J.  358 
Wilson,  S.  G.  380 
Wilson,  S.K.  47 
Winick,  K.  A.  237 
Wittke,  P.  H.  310 
Wolf.J.  K.  6.202 
Wolfmann,  J.  149 
Woods.!.  W.  93 
Wu,!.  343 
Wu.!.-L.  114 
Wu,  X.  389 
Wu.Y.  W.  79 
Wyner,  A.  D.  117 

X 

Xia.X.-G.  293,329 
Xu,G.  188 
Xu,L.  353 

Y 

Yaghoobian,  T.  96 
Yamaguchi,  K.  64,  138 
Yamamoto,  H.  69 
Yamazato,  T.  116 
Yang,  E.-h.  337 
Yang.G.-c.  378 
Yang,S.-H.  237 
Yang,  S.-M.  104 
Yao,  K.  205,332 
Yates,  K.W.  256 
Ye.Z.  429 
Yeung.  R.W.  77 
Yoshida,  K.  292 
Ytrehus,  O.  140,  242 
Yu,  C.-L.  114 
Yuan,!.-L.  351 
Yuille.A.  353 

Z 

Zamir,  R.  74,  439 
Zeger.K.  65,390,393 
Zehavi,  E.  160,  239 
Zfmor.G.  150.370 
Zhang, !.  92 

Zhang,  Z.  293.298.329,409 
Zigangirov,  K.  269.  288 
Ziv,!.  117,352 
Zvonar,  Z.  48 
Zyablov.V.V.  23 


